diff --git a/example/transform/data/raw_input/paper_1_raw.md b/example/transform/data/raw_input/paper_1_raw.md new file mode 100644 index 00000000..db126aea --- /dev/null +++ b/example/transform/data/raw_input/paper_1_raw.md @@ -0,0 +1,19 @@ +# The Impact of Artificial Intelligence on Healthcare: A Review + +## Abstract +Artificial intelligence (AI) has emerged as a transformative technology in healthcare, offering opportunities to improve patient outcomes, streamline processes, and enhance decision-making. This paper provides an overview of the current state of AI in healthcare, explores its applications, benefits, challenges, and future prospects. + +## Introduction +In recent years, artificial intelligence has gained significant traction across various industries, and healthcare is no exception. With advancements in machine learning, natural language processing, and robotics, AI has the potential to revolutionize healthcare delivery, diagnosis, treatment, and management. This paper aims to delve into the role of AI in healthcare, highlighting its implications, challenges, and future directions. + +## Background +The integration of AI into healthcare systems has been facilitated by the exponential growth of data, coupled with the development of sophisticated algorithms. Machine learning algorithms, such as deep learning, support vector machines, and random forests, enable healthcare providers to analyze large datasets, identify patterns, and extract actionable insights. Furthermore, natural language processing techniques empower AI systems to interpret and generate human language, facilitating tasks such as clinical documentation, medical coding, and patient communication. + +## Approach +To examine the impact of AI on healthcare, we conducted a comprehensive literature review, analyzing research articles, industry reports, and case studies. We focused on key applications of AI in healthcare, including disease diagnosis, personalized treatment planning, drug discovery, remote patient monitoring, and predictive analytics. Additionally, we explored the challenges associated with AI adoption in healthcare, such as data privacy concerns, regulatory barriers, algorithm bias, and interoperability issues. + +## Experiment/Result +Our analysis revealed that AI holds immense promise for transforming healthcare delivery and improving patient outcomes. AI-powered diagnostic systems demonstrate high accuracy and efficiency in detecting various medical conditions, ranging from cancer and cardiovascular diseases to infectious diseases and neurological disorders. Moreover, AI-driven predictive analytics enable healthcare providers to anticipate disease outbreaks, optimize resource allocation, and enhance population health management. Despite these advancements, several challenges hinder the widespread adoption of AI in healthcare, including data quality issues, algorithmic bias, ethical considerations, and regulatory constraints. + +## Conclusion/Future Work +Looking ahead, future research should focus on addressing the technical, ethical, and regulatory challenges associated with AI in healthcare. Efforts to enhance the interpretability, fairness, and transparency of AI algorithms are critical to building trust among healthcare professionals and patients. Moreover, interdisciplinary collaboration between computer scientists, healthcare professionals, policymakers, and ethicists is essential to develop robust frameworks for AI governance and ensure responsible AI deployment in healthcare settings. Additionally, longitudinal studies are needed to assess the long-term impact of AI on patient outcomes, healthcare costs, and healthcare disparities. By addressing these challenges and leveraging the full potential of AI, we can unlock new opportunities for advancing healthcare delivery, enhancing clinical decision-making, and ultimately improving the quality of care for patients worldwide. diff --git a/example/transform/data/raw_input/paper_2_raw.md b/example/transform/data/raw_input/paper_2_raw.md new file mode 100644 index 00000000..bac64311 --- /dev/null +++ b/example/transform/data/raw_input/paper_2_raw.md @@ -0,0 +1,21 @@ +# The Impact of Renewable Energy Adoption on Global Carbon Emissions: An Analytical Study + +## Abstract +This paper examines the impact of renewable energy adoption on global carbon emissions. With climate change posing a significant threat to the environment and human societies, transitioning to renewable energy sources has become a crucial global initiative. This study analyzes the correlation between increased use of renewable energy sources, such as wind, solar, and hydro, and the subsequent changes in carbon emissions worldwide. Utilizing data from various countries over the past two decades, we employ statistical models to assess the effectiveness of renewable energy in reducing carbon footprints. Our findings suggest that renewable energy adoption is a viable strategy for significantly reducing global carbon emissions, highlighting the need for policies that support renewable energy investments and infrastructure development. + +## Introduction +Climate change remains one of the most pressing challenges of our time, with carbon emissions from fossil fuel consumption being a primary contributor. The transition to renewable energy sources is widely viewed as a vital step towards mitigating climate change impacts. This paper explores the effectiveness of renewable energy adoption in reducing global carbon emissions. By examining data from multiple countries, we aim to provide a comprehensive analysis of how renewable energy usage influences carbon emission trends and to evaluate the potential of renewable energy as a sustainable solution to climate change. + +## Background +The relationship between human activities, especially the burning of fossil fuels, and climate change is well-documented. Renewable energy sources offer an alternative that does not emit carbon dioxide during operation, thus presenting a potential pathway to decarbonize the energy sector. Governments and organizations worldwide have made commitments to increase the share of renewables in their energy mix. This paper builds on existing research by analyzing more recent data to understand the current impact of renewable energy adoption on carbon emissions. + +## Approach +Our approach involves collecting and analyzing data on renewable energy consumption and carbon emissions from various countries over the last twenty years. We focus on wind, solar, and hydroelectric power due to their significant growth and potential for large-scale implementation. The study employs statistical analysis methods to identify trends, correlations, and causations between the adoption of renewable energy and changes in carbon emissions. We adjust for factors such as economic growth, population changes, and energy efficiency improvements to isolate the impact of renewable energy. + +## Experiment/Result +The analysis reveals a clear negative correlation between the adoption of renewable energy sources and carbon emissions in countries with aggressive renewable energy policies. For instance, countries that have doubled their renewable energy consumption in the past decade have seen, on average, a 10% reduction in carbon emissions, even after accounting for economic and population growth. These findings are consistent across developed and developing nations, suggesting that renewable energy can be an effective tool for reducing carbon emissions globally. + +## Conclusion/Future Work +The study confirms that renewable energy adoption plays a crucial role in reducing global carbon emissions. The findings support the need for policies and investments that encourage the development and deployment of renewable energy technologies. Future work should focus on longitudinal studies to track the long-term impact of renewable energy adoption on carbon emissions. Additionally, further research is needed to explore the socio-economic benefits of transitioning to renewable energy, such as job creation, health improvements, and energy security, to provide a more comprehensive understanding of its impacts. + + diff --git a/example/transform/google_paper_comparison_model.ipynb b/example/transform/google_paper_comparison_model.ipynb new file mode 100644 index 00000000..6f8d323e --- /dev/null +++ b/example/transform/google_paper_comparison_model.ipynb @@ -0,0 +1,1404 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Notebook for Paper Comparison Flow\n", + "\n", + "In this example, we will show you how to compare two papers from given markdown files using Google's models via uniflow.\n", + "\n", + "### Before running the code\n", + "\n", + "You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.\n", + "\n", + "Next, you will need a valid [Google API key](https://ai.google.dev/tutorials/setup) to run the code. Once you have the key, set it as the environment variable `GOOGLE_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)\n", + "\n", + "### Update system path" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "%reload_ext autoreload\n", + "%autoreload 2\n", + "\n", + "import sys\n", + "\n", + "sys.path.append(\".\")\n", + "sys.path.append(\"..\")\n", + "sys.path.append(\"../..\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Import dependency" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "c:\\Users\\Pumpkinfries\\anaconda3\\envs\\uniflow\\lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + }, + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from dotenv import load_dotenv\n", + "from IPython.display import display\n", + "\n", + "from uniflow.flow.client import TransformClient\n", + "from uniflow.flow.flow_factory import FlowFactory\n", + "from uniflow.flow.config import TransformConfig\n", + "from uniflow.op.model.model_config import GoogleModelConfig\n", + "\n", + "from uniflow.viz import Viz\n", + "from uniflow.op.prompt import Context\n", + "\n", + "load_dotenv()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display the different flows" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'extract': ['ExtractHTMLFlow',\n", + " 'ExtractImageFlow',\n", + " 'ExtractIpynbFlow',\n", + " 'ExtractMarkdownFlow',\n", + " 'ExtractPDFFlow',\n", + " 'ExtractTxtFlow',\n", + " 'ExtractGmailFlow'],\n", + " 'transform': ['TransformAzureOpenAIFlow',\n", + " 'TransformCopyFlow',\n", + " 'TransformGoogleFlow',\n", + " 'TransformGoogleMultiModalModelFlow',\n", + " 'TransformHuggingFaceFlow',\n", + " 'TransformLMQGFlow',\n", + " 'TransformOpenAIFlow',\n", + " 'TransformComparisonGoogleFlow',\n", + " 'TransformComparisonOpenAIFlow'],\n", + " 'rater': ['RaterFlow']}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "FlowFactory.list()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Prepare Sample Prompts\n", + "Use preprocessed raw markdowns for now" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['# The Impact of Artificial Intelligence on Healthcare: A Review\\n\\n## Abstract\\nArtificial intelligence (AI) has emerged as a transformative technology in healthcare, offering opportunities to improve patient outcomes, streamline processes, and enhance decision-making. This paper provides an overview of the current state of AI in healthcare, explores its applications, benefits, challenges, and future prospects.\\n\\n## Introduction\\nIn recent years, artificial intelligence has gained significant traction across various industries, and healthcare is no exception. With advancements in machine learning, natural language processing, and robotics, AI has the potential to revolutionize healthcare delivery, diagnosis, treatment, and management. This paper aims to delve into the role of AI in healthcare, highlighting its implications, challenges, and future directions.\\n\\n## Background\\nThe integration of AI into healthcare systems has been facilitated by the exponential growth of data, coupled with the development of sophisticated algorithms. Machine learning algorithms, such as deep learning, support vector machines, and random forests, enable healthcare providers to analyze large datasets, identify patterns, and extract actionable insights. Furthermore, natural language processing techniques empower AI systems to interpret and generate human language, facilitating tasks such as clinical documentation, medical coding, and patient communication.\\n\\n## Approach\\nTo examine the impact of AI on healthcare, we conducted a comprehensive literature review, analyzing research articles, industry reports, and case studies. We focused on key applications of AI in healthcare, including disease diagnosis, personalized treatment planning, drug discovery, remote patient monitoring, and predictive analytics. Additionally, we explored the challenges associated with AI adoption in healthcare, such as data privacy concerns, regulatory barriers, algorithm bias, and interoperability issues.\\n\\n## Experiment/Result\\nOur analysis revealed that AI holds immense promise for transforming healthcare delivery and improving patient outcomes. AI-powered diagnostic systems demonstrate high accuracy and efficiency in detecting various medical conditions, ranging from cancer and cardiovascular diseases to infectious diseases and neurological disorders. Moreover, AI-driven predictive analytics enable healthcare providers to anticipate disease outbreaks, optimize resource allocation, and enhance population health management. Despite these advancements, several challenges hinder the widespread adoption of AI in healthcare, including data quality issues, algorithmic bias, ethical considerations, and regulatory constraints.\\n\\n## Conclusion/Future Work\\nLooking ahead, future research should focus on addressing the technical, ethical, and regulatory challenges associated with AI in healthcare. Efforts to enhance the interpretability, fairness, and transparency of AI algorithms are critical to building trust among healthcare professionals and patients. Moreover, interdisciplinary collaboration between computer scientists, healthcare professionals, policymakers, and ethicists is essential to develop robust frameworks for AI governance and ensure responsible AI deployment in healthcare settings. Additionally, longitudinal studies are needed to assess the long-term impact of AI on patient outcomes, healthcare costs, and healthcare disparities. By addressing these challenges and leveraging the full potential of AI, we can unlock new opportunities for advancing healthcare delivery, enhancing clinical decision-making, and ultimately improving the quality of care for patients worldwide.\\n',\n", + " '# The Impact of Renewable Energy Adoption on Global Carbon Emissions: An Analytical Study\\n\\n## Abstract\\nThis paper examines the impact of renewable energy adoption on global carbon emissions. With climate change posing a significant threat to the environment and human societies, transitioning to renewable energy sources has become a crucial global initiative. This study analyzes the correlation between increased use of renewable energy sources, such as wind, solar, and hydro, and the subsequent changes in carbon emissions worldwide. Utilizing data from various countries over the past two decades, we employ statistical models to assess the effectiveness of renewable energy in reducing carbon footprints. Our findings suggest that renewable energy adoption is a viable strategy for significantly reducing global carbon emissions, highlighting the need for policies that support renewable energy investments and infrastructure development.\\n\\n## Introduction\\nClimate change remains one of the most pressing challenges of our time, with carbon emissions from fossil fuel consumption being a primary contributor. The transition to renewable energy sources is widely viewed as a vital step towards mitigating climate change impacts. This paper explores the effectiveness of renewable energy adoption in reducing global carbon emissions. By examining data from multiple countries, we aim to provide a comprehensive analysis of how renewable energy usage influences carbon emission trends and to evaluate the potential of renewable energy as a sustainable solution to climate change.\\n\\n## Background\\nThe relationship between human activities, especially the burning of fossil fuels, and climate change is well-documented. Renewable energy sources offer an alternative that does not emit carbon dioxide during operation, thus presenting a potential pathway to decarbonize the energy sector. Governments and organizations worldwide have made commitments to increase the share of renewables in their energy mix. This paper builds on existing research by analyzing more recent data to understand the current impact of renewable energy adoption on carbon emissions.\\n\\n## Approach\\nOur approach involves collecting and analyzing data on renewable energy consumption and carbon emissions from various countries over the last twenty years. We focus on wind, solar, and hydroelectric power due to their significant growth and potential for large-scale implementation. The study employs statistical analysis methods to identify trends, correlations, and causations between the adoption of renewable energy and changes in carbon emissions. We adjust for factors such as economic growth, population changes, and energy efficiency improvements to isolate the impact of renewable energy.\\n\\n## Experiment/Result\\nThe analysis reveals a clear negative correlation between the adoption of renewable energy sources and carbon emissions in countries with aggressive renewable energy policies. For instance, countries that have doubled their renewable energy consumption in the past decade have seen, on average, a 10% reduction in carbon emissions, even after accounting for economic and population growth. These findings are consistent across developed and developing nations, suggesting that renewable energy can be an effective tool for reducing carbon emissions globally.\\n\\n## Conclusion/Future Work\\nThe study confirms that renewable energy adoption plays a crucial role in reducing global carbon emissions. The findings support the need for policies and investments that encourage the development and deployment of renewable energy technologies. Future work should focus on longitudinal studies to track the long-term impact of renewable energy adoption on carbon emissions. Additionally, further research is needed to explore the socio-economic benefits of transitioning to renewable energy, such as job creation, health improvements, and energy security, to provide a more comprehensive understanding of its impacts.\\n\\n\\n']" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "with open(r\"data/raw_input/paper_1_raw.md\", 'r') as file:\n", + " paper_1_content = file.read()\n", + "\n", + "with open(r\"data/raw_input/paper_2_raw.md\", 'r') as file:\n", + " paper_2_content = file.read()\n", + "\n", + "raw_context_input = [\n", + " paper_1_content,\n", + " paper_2_content,\n", + "]\n", + "raw_context_input" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, for the given raw text strings `raw_context_input` above, we convert them to the `Context` class to be processed by `uniflow`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Run Comparison Flow\n", + "In this example, we use the base `Config` defaults with the GoogleModelConfig to generate questions and answers." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "data = [[Context(Context=raw_context_input[0]), Context(Context=raw_context_input[1])]]\n", + "\n", + "config = TransformConfig(\n", + " flow_name=\"TransformComparisonGoogleFlow\",\n", + " model_config=GoogleModelConfig()\n", + ")\n", + "client = TransformClient(config)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 1/1 [01:24<00:00, 84.53s/it]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1-Abstract\n", + "\n", + "**Similarities:**\n", + "\n", + "* Both papers discuss the potential benefits of AI in healthcare.\n", + "* Both papers acknowledge the ethical concerns associated with AI in healthcare.\n", + "* Both papers emphasize the importance of collaboration between researchers, healthcare professionals, and policymakers in the integration of AI into healthcare.\n", + "\n", + "**Differences:**\n", + "\n", + "* **Topic:** Paper A focuses specifically on the applications of AI in healthcare, while Paper B focuses on the impact of renewable energy adoption on global carbon emissions.\n", + "* **Scope:** Paper A provides a comprehensive overview of the current landscape of AI in healthcare, while Paper B presents a specific study on the correlation between renewable energy adoption and carbon emissions.\n", + "* **Methodology:** Paper A does not explicitly mention the methodology used in the research, while Paper B describes the data sources and statistical models used in the study.\n", + "* **Findings:** Paper A discusses the potential benefits of AI in various areas of healthcare, while Paper B presents empirical data and modeling results to demonstrate the impact of renewable energy adoption on carbon emissions.\n", + "* **Conclusion:** Paper A emphasizes the need for ethical guidelines and responsible use of AI in healthcare, while Paper B calls for increased investment in renewable energy technologies and supportive policies to accelerate the transition to a sustainable energy future.\n", + "2-Introduction\n", + "\n", + "**Similarities:**\n", + "\n", + "* Both papers address pressing global issues: AI in healthcare (Paper A) and climate change (Paper B).\n", + "* Both papers aim to explore the potential of technological advancements (AI and renewable energy) in addressing these challenges.\n", + "* Both papers intend to provide insights into the implications, challenges, and future directions of their respective topics.\n", + "\n", + "**Differences:**\n", + "\n", + "* **Topic:** Paper A focuses on the role of AI in healthcare, while Paper B examines the effectiveness of renewable energy in mitigating climate change.\n", + "* **Methodology:** Paper A does not specify a specific methodology, while Paper B mentions analyzing data from multiple countries to assess the relationship between renewable energy usage and carbon emission trends.\n", + "* **Scope:** Paper A appears to have a broader scope, exploring the potential of AI in various aspects of healthcare, while Paper B has a more specific focus on the impact of renewable energy on carbon emissions.\n", + "* **Target Audience:** Paper A is likely aimed at researchers and healthcare professionals interested in the applications of AI in healthcare, while Paper B is likely targeted at environmental scientists, policymakers, and stakeholders involved in climate change mitigation efforts.\n", + "3-Background\n", + "\n", + "**Similarities:**\n", + "\n", + "* Both papers focus on the application of advanced technologies to address real-world problems.\n", + "* Both papers acknowledge the importance of data availability and analysis in their respective fields.\n", + "\n", + "**Differences:**\n", + "\n", + "* **Topic:** Paper A focuses on the use of AI in healthcare, while Paper B focuses on the use of renewable energy in mitigating climate change.\n", + "* **Methodology:** Paper A discusses the use of machine learning and natural language processing techniques, while Paper B mentions the analysis of data to assess the impact of renewable energy adoption.\n", + "* **Scope:** Paper A provides a broad overview of AI applications in healthcare, while Paper B focuses on a specific aspect of renewable energy adoption (its impact on carbon emissions).\n", + "* **Target Audience:** Paper A is likely aimed at healthcare professionals and researchers, while Paper B is likely aimed at policymakers, energy experts, and environmental scientists.\n", + "4-Approach\n", + "\n", + "**Similarities:**\n", + "\n", + "* Both papers involve a comprehensive literature review and data analysis.\n", + "* Both papers focus on emerging technologies and their impact on specific domains (healthcare in paper A, renewable energy in paper B).\n", + "\n", + "**Differences:**\n", + "\n", + "* **Research Focus:** Paper A examines the impact of AI on healthcare, while Paper B investigates the relationship between renewable energy consumption and carbon emissions.\n", + "* **Data Sources:** Paper A analyzes research articles, industry reports, and case studies, while Paper B collects and analyzes data from various countries over 20 years.\n", + "* **Methodology:** Paper A primarily uses qualitative analysis, while Paper B employs statistical analysis methods.\n", + "* **Challenges:** Paper A highlights challenges associated with AI adoption in healthcare, such as data privacy and algorithm bias, while Paper B does not explicitly discuss challenges related to renewable energy adoption.\n", + "* **Scope:** Paper A focuses on the applications and challenges of AI in healthcare, while Paper B examines the broader impact of renewable energy on carbon emissions.\n", + "5-Experiment or Result\n", + "\n", + "**Similarities:**\n", + "\n", + "* Both papers discuss the potential benefits of advanced technologies in their respective fields.\n", + "* Both papers acknowledge challenges that hinder the widespread adoption of these technologies.\n", + "\n", + "**Differences:**\n", + "\n", + "* **Topic:** Paper A focuses on the application of AI in healthcare, while Paper B examines the relationship between renewable energy adoption and carbon emissions.\n", + "* **Methodology:** Paper A discusses the potential and challenges of AI in healthcare, while Paper B presents an analysis of data to support its claims.\n", + "* **Scope:** Paper A provides a broad overview of AI in healthcare, while Paper B focuses on the specific relationship between renewable energy and carbon emissions.\n", + "* **Data:** Paper A does not provide specific data or evidence to support its claims, while Paper B presents an analysis of data from multiple countries.\n", + "* **Conclusion:** Paper A concludes by highlighting the need to address challenges for the widespread adoption of AI in healthcare, while Paper B concludes by emphasizing the potential of renewable energy as a global solution for reducing carbon emissions.\n", + "6-Conclusion or Future work\n", + "\n", + "**Similarities:**\n", + "\n", + "* Both papers emphasize the need for further research to address challenges and optimize the use of technology.\n", + "* Both papers highlight the importance of interdisciplinary collaboration and responsible deployment of technology.\n", + "* Both papers acknowledge the potential benefits of technology in improving outcomes and reducing disparities.\n", + "\n", + "**Differences:**\n", + "\n", + "* **Focus:** Paper A focuses on the challenges and opportunities of AI in healthcare, while Paper B focuses on the role of renewable energy in mitigating carbon emissions.\n", + "* **Scope:** Paper A discusses technical, ethical, and regulatory aspects of AI in healthcare, while Paper B focuses on the environmental and socio-economic impacts of renewable energy adoption.\n", + "* **Methodology:** Paper A suggests longitudinal studies to assess the long-term impact of AI, while Paper B proposes longitudinal studies to monitor the long-term effects of renewable energy adoption.\n", + "* **Implications:** Paper A emphasizes the need for robust frameworks for AI governance, while Paper B highlights the importance of policies and investments to promote renewable energy development.\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "text/plain": [ + "'1-Abstract\\n\\n**Similarities:**\\n\\n* Both papers discuss the potential benefits of AI in healthcare.\\n* Both papers acknowledge the ethical concerns associated with AI in healthcare.\\n* Both papers emphasize the importance of collaboration between researchers, healthcare professionals, and policymakers in the integration of AI into healthcare.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses specifically on the applications of AI in healthcare, while Paper B focuses on the impact of renewable energy adoption on global carbon emissions.\\n* **Scope:** Paper A provides a comprehensive overview of the current landscape of AI in healthcare, while Paper B presents a specific study on the correlation between renewable energy adoption and carbon emissions.\\n* **Methodology:** Paper A does not explicitly mention the methodology used in the research, while Paper B describes the data sources and statistical models used in the study.\\n* **Findings:** Paper A discusses the potential benefits of AI in various areas of healthcare, while Paper B presents empirical data and modeling results to demonstrate the impact of renewable energy adoption on carbon emissions.\\n* **Conclusion:** Paper A emphasizes the need for ethical guidelines and responsible use of AI in healthcare, while Paper B calls for increased investment in renewable energy technologies and supportive policies to accelerate the transition to a sustainable energy future.\\n\\n2-Introduction\\n\\n**Similarities:**\\n\\n* Both papers address pressing global issues: AI in healthcare (Paper A) and climate change (Paper B).\\n* Both papers aim to explore the potential of technological advancements (AI and renewable energy) in addressing these challenges.\\n* Both papers intend to provide insights into the implications, challenges, and future directions of their respective topics.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses on the role of AI in healthcare, while Paper B examines the effectiveness of renewable energy in mitigating climate change.\\n* **Methodology:** Paper A does not specify a specific methodology, while Paper B mentions analyzing data from multiple countries to assess the relationship between renewable energy usage and carbon emission trends.\\n* **Scope:** Paper A appears to have a broader scope, exploring the potential of AI in various aspects of healthcare, while Paper B has a more specific focus on the impact of renewable energy on carbon emissions.\\n* **Target Audience:** Paper A is likely aimed at researchers and healthcare professionals interested in the applications of AI in healthcare, while Paper B is likely targeted at environmental scientists, policymakers, and stakeholders involved in climate change mitigation efforts.\\n\\n3-Background\\n\\n**Similarities:**\\n\\n* Both papers focus on the application of advanced technologies to address real-world problems.\\n* Both papers acknowledge the importance of data availability and analysis in their respective fields.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses on the use of AI in healthcare, while Paper B focuses on the use of renewable energy in mitigating climate change.\\n* **Methodology:** Paper A discusses the use of machine learning and natural language processing techniques, while Paper B mentions the analysis of data to assess the impact of renewable energy adoption.\\n* **Scope:** Paper A provides a broad overview of AI applications in healthcare, while Paper B focuses on a specific aspect of renewable energy adoption (its impact on carbon emissions).\\n* **Target Audience:** Paper A is likely aimed at healthcare professionals and researchers, while Paper B is likely aimed at policymakers, energy experts, and environmental scientists.\\n\\n4-Approach\\n\\n**Similarities:**\\n\\n* Both papers involve a comprehensive literature review and data analysis.\\n* Both papers focus on emerging technologies and their impact on specific domains (healthcare in paper A, renewable energy in paper B).\\n\\n**Differences:**\\n\\n* **Research Focus:** Paper A examines the impact of AI on healthcare, while Paper B investigates the relationship between renewable energy consumption and carbon emissions.\\n* **Data Sources:** Paper A analyzes research articles, industry reports, and case studies, while Paper B collects and analyzes data from various countries over 20 years.\\n* **Methodology:** Paper A primarily uses qualitative analysis, while Paper B employs statistical analysis methods.\\n* **Challenges:** Paper A highlights challenges associated with AI adoption in healthcare, such as data privacy and algorithm bias, while Paper B does not explicitly discuss challenges related to renewable energy adoption.\\n* **Scope:** Paper A focuses on the applications and challenges of AI in healthcare, while Paper B examines the broader impact of renewable energy on carbon emissions.\\n\\n5-Experiment or Result\\n\\n**Similarities:**\\n\\n* Both papers discuss the potential benefits of advanced technologies in their respective fields.\\n* Both papers acknowledge challenges that hinder the widespread adoption of these technologies.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses on the application of AI in healthcare, while Paper B examines the relationship between renewable energy adoption and carbon emissions.\\n* **Methodology:** Paper A discusses the potential and challenges of AI in healthcare, while Paper B presents an analysis of data to support its claims.\\n* **Scope:** Paper A provides a broad overview of AI in healthcare, while Paper B focuses on the specific relationship between renewable energy and carbon emissions.\\n* **Data:** Paper A does not provide specific data or evidence to support its claims, while Paper B presents an analysis of data from multiple countries.\\n* **Conclusion:** Paper A concludes by highlighting the need to address challenges for the widespread adoption of AI in healthcare, while Paper B concludes by emphasizing the potential of renewable energy as a global solution for reducing carbon emissions.\\n\\n6-Conclusion or Future work\\n\\n**Similarities:**\\n\\n* Both papers emphasize the need for further research to address challenges and optimize the use of technology.\\n* Both papers highlight the importance of interdisciplinary collaboration and responsible deployment of technology.\\n* Both papers acknowledge the potential benefits of technology in improving outcomes and reducing disparities.\\n\\n**Differences:**\\n\\n* **Focus:** Paper A focuses on the challenges and opportunities of AI in healthcare, while Paper B focuses on the role of renewable energy in mitigating carbon emissions.\\n* **Scope:** Paper A discusses technical, ethical, and regulatory aspects of AI in healthcare, while Paper B focuses on the environmental and socio-economic impacts of renewable energy adoption.\\n* **Methodology:** Paper A suggests longitudinal studies to assess the long-term impact of AI, while Paper B proposes longitudinal studies to monitor the long-term effects of renewable energy adoption.\\n* **Implications:** Paper A emphasizes the need for robust frameworks for AI governance, while Paper B highlights the importance of policies and investments to promote renewable energy development.\\n\\n'" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "output = client.run(data)\n", + "paragraph = ''\n", + "label_list = [\"1-Abstract\", \"2-Introduction\", \"3-Background\", \"4-Approach\", \"5-Experiment or Result\", \"6-Conclusion or Future work\"]\n", + "\n", + "for inner_output, cat in zip(output[0]['output'], label_list):\n", + " print(cat + '\\n\\n' + inner_output['response'][0])\n", + " paragraph += cat + '\\n\\n' + inner_output['response'][0] + '\\n\\n' \n", + "paragraph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### View the output\n", + "\n", + "Let's take a look of the generated output." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plot model flow graph\n", + "Here, we visualize the model flow graph for the `ModelFlow`." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'output': [{'response': ['**Similarities:**\\n\\n* Both papers discuss the potential benefits of AI in healthcare.\\n* Both papers acknowledge the ethical concerns associated with AI in healthcare.\\n* Both papers emphasize the importance of collaboration between researchers, healthcare professionals, and policymakers in the integration of AI into healthcare.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses specifically on the applications of AI in healthcare, while Paper B focuses on the impact of renewable energy adoption on global carbon emissions.\\n* **Scope:** Paper A provides a comprehensive overview of the current landscape of AI in healthcare, while Paper B presents a specific study on the correlation between renewable energy adoption and carbon emissions.\\n* **Methodology:** Paper A does not explicitly mention the methodology used in the research, while Paper B describes the data sources and statistical models used in the study.\\n* **Findings:** Paper A discusses the potential benefits of AI in various areas of healthcare, while Paper B presents empirical data and modeling results to demonstrate the impact of renewable energy adoption on carbon emissions.\\n* **Conclusion:** Paper A emphasizes the need for ethical guidelines and responsible use of AI in healthcare, while Paper B calls for increased investment in renewable energy technologies and supportive policies to accelerate the transition to a sustainable energy future.'],\n", + " 'error': 'No errors.'},\n", + " {'response': ['**Similarities:**\\n\\n* Both papers address pressing global issues: AI in healthcare (Paper A) and climate change (Paper B).\\n* Both papers aim to explore the potential of technological advancements (AI and renewable energy) in addressing these challenges.\\n* Both papers intend to provide insights into the implications, challenges, and future directions of their respective topics.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses on the role of AI in healthcare, while Paper B examines the effectiveness of renewable energy in mitigating climate change.\\n* **Methodology:** Paper A does not specify a specific methodology, while Paper B mentions analyzing data from multiple countries to assess the relationship between renewable energy usage and carbon emission trends.\\n* **Scope:** Paper A appears to have a broader scope, exploring the potential of AI in various aspects of healthcare, while Paper B has a more specific focus on the impact of renewable energy on carbon emissions.\\n* **Target Audience:** Paper A is likely aimed at researchers and healthcare professionals interested in the applications of AI in healthcare, while Paper B is likely targeted at environmental scientists, policymakers, and stakeholders involved in climate change mitigation efforts.'],\n", + " 'error': 'No errors.'},\n", + " {'response': ['**Similarities:**\\n\\n* Both papers focus on the application of advanced technologies to address real-world problems.\\n* Both papers acknowledge the importance of data availability and analysis in their respective fields.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses on the use of AI in healthcare, while Paper B focuses on the use of renewable energy in mitigating climate change.\\n* **Methodology:** Paper A discusses the use of machine learning and natural language processing techniques, while Paper B mentions the analysis of data to assess the impact of renewable energy adoption.\\n* **Scope:** Paper A provides a broad overview of AI applications in healthcare, while Paper B focuses on a specific aspect of renewable energy adoption (its impact on carbon emissions).\\n* **Target Audience:** Paper A is likely aimed at healthcare professionals and researchers, while Paper B is likely aimed at policymakers, energy experts, and environmental scientists.'],\n", + " 'error': 'No errors.'},\n", + " {'response': ['**Similarities:**\\n\\n* Both papers involve a comprehensive literature review and data analysis.\\n* Both papers focus on emerging technologies and their impact on specific domains (healthcare in paper A, renewable energy in paper B).\\n\\n**Differences:**\\n\\n* **Research Focus:** Paper A examines the impact of AI on healthcare, while Paper B investigates the relationship between renewable energy consumption and carbon emissions.\\n* **Data Sources:** Paper A analyzes research articles, industry reports, and case studies, while Paper B collects and analyzes data from various countries over 20 years.\\n* **Methodology:** Paper A primarily uses qualitative analysis, while Paper B employs statistical analysis methods.\\n* **Challenges:** Paper A highlights challenges associated with AI adoption in healthcare, such as data privacy and algorithm bias, while Paper B does not explicitly discuss challenges related to renewable energy adoption.\\n* **Scope:** Paper A focuses on the applications and challenges of AI in healthcare, while Paper B examines the broader impact of renewable energy on carbon emissions.'],\n", + " 'error': 'No errors.'},\n", + " {'response': ['**Similarities:**\\n\\n* Both papers discuss the potential benefits of advanced technologies in their respective fields.\\n* Both papers acknowledge challenges that hinder the widespread adoption of these technologies.\\n\\n**Differences:**\\n\\n* **Topic:** Paper A focuses on the application of AI in healthcare, while Paper B examines the relationship between renewable energy adoption and carbon emissions.\\n* **Methodology:** Paper A discusses the potential and challenges of AI in healthcare, while Paper B presents an analysis of data to support its claims.\\n* **Scope:** Paper A provides a broad overview of AI in healthcare, while Paper B focuses on the specific relationship between renewable energy and carbon emissions.\\n* **Data:** Paper A does not provide specific data or evidence to support its claims, while Paper B presents an analysis of data from multiple countries.\\n* **Conclusion:** Paper A concludes by highlighting the need to address challenges for the widespread adoption of AI in healthcare, while Paper B concludes by emphasizing the potential of renewable energy as a global solution for reducing carbon emissions.'],\n", + " 'error': 'No errors.'},\n", + " {'response': ['**Similarities:**\\n\\n* Both papers emphasize the need for further research to address challenges and optimize the use of technology.\\n* Both papers highlight the importance of interdisciplinary collaboration and responsible deployment of technology.\\n* Both papers acknowledge the potential benefits of technology in improving outcomes and reducing disparities.\\n\\n**Differences:**\\n\\n* **Focus:** Paper A focuses on the challenges and opportunities of AI in healthcare, while Paper B focuses on the role of renewable energy in mitigating carbon emissions.\\n* **Scope:** Paper A discusses technical, ethical, and regulatory aspects of AI in healthcare, while Paper B focuses on the environmental and socio-economic impacts of renewable energy adoption.\\n* **Methodology:** Paper A suggests longitudinal studies to assess the long-term impact of AI, while Paper B proposes longitudinal studies to monitor the long-term effects of renewable energy adoption.\\n* **Implications:** Paper A emphasizes the need for robust frameworks for AI governance, while Paper B highlights the importance of policies and investments to promote renewable energy development.'],\n", + " 'error': 'No errors.'}],\n", + " 'root': }]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "output" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "graph = Viz.to_digraph(output[0]['root'])" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "root\n", + "\n", + "root\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1\n", + "\n", + "thread_0/expand_to_node_from_nodes_1\n", + "\n", + "\n", + "\n", + "root->thread_0/expand_to_node_from_nodes_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2\n", + "\n", + "thread_0/expand_to_node_from_nodes_2\n", + "\n", + "\n", + "\n", + "root->thread_0/expand_to_node_from_nodes_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_1\n", + "\n", + "thread_0/split_to_chunks_1\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_2\n", + "\n", + "thread_0/split_to_chunks_2\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_3\n", + "\n", + "thread_0/split_to_chunks_3\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_4\n", + "\n", + "thread_0/split_to_chunks_4\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_5\n", + "\n", + "thread_0/split_to_chunks_5\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_6\n", + "\n", + "thread_0/split_to_chunks_6\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_7\n", + "\n", + "thread_0/split_to_chunks_7\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_1->thread_0/split_to_chunks_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_8\n", + "\n", + "thread_0/split_to_chunks_8\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_8\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_9\n", + "\n", + "thread_0/split_to_chunks_9\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_9\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_10\n", + "\n", + "thread_0/split_to_chunks_10\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_10\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_11\n", + "\n", + "thread_0/split_to_chunks_11\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_11\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_12\n", + "\n", + "thread_0/split_to_chunks_12\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_12\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_13\n", + "\n", + "thread_0/split_to_chunks_13\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_13\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_14\n", + "\n", + "thread_0/split_to_chunks_14\n", + "\n", + "\n", + "\n", + "thread_0/expand_to_node_from_nodes_2->thread_0/split_to_chunks_14\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_1\n", + "\n", + "thread_0/google_model_label_1\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_1->thread_0/google_model_label_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_1\n", + "\n", + "thread_0/google_model_summary_1\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_1->thread_0/google_model_summary_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_2\n", + "\n", + "thread_0/google_model_label_2\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_2->thread_0/google_model_label_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_2\n", + "\n", + "thread_0/google_model_summary_2\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_2->thread_0/google_model_summary_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_3\n", + "\n", + "thread_0/google_model_label_3\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_3->thread_0/google_model_label_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_3\n", + "\n", + "thread_0/google_model_summary_3\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_3->thread_0/google_model_summary_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_4\n", + "\n", + "thread_0/google_model_label_4\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_4->thread_0/google_model_label_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_4\n", + "\n", + "thread_0/google_model_summary_4\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_4->thread_0/google_model_summary_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_5\n", + "\n", + "thread_0/google_model_label_5\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_5->thread_0/google_model_label_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_5\n", + "\n", + "thread_0/google_model_summary_5\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_5->thread_0/google_model_summary_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_6\n", + "\n", + "thread_0/google_model_label_6\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_6->thread_0/google_model_label_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_6\n", + "\n", + "thread_0/google_model_summary_6\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_6->thread_0/google_model_summary_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_7\n", + "\n", + "thread_0/google_model_label_7\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_7->thread_0/google_model_label_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_7\n", + "\n", + "thread_0/google_model_summary_7\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_7->thread_0/google_model_summary_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_1\n", + "\n", + "thread_0/summaries_groupby_labels_1\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_1->thread_0/summaries_groupby_labels_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_1->thread_0/summaries_groupby_labels_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_1\n", + "\n", + "thread_0/reduce_to_pairs_1\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_1->thread_0/reduce_to_pairs_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_compare_1\n", + "\n", + "thread_0/google_model_compare_1\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_1->thread_0/google_model_compare_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_2->thread_0/summaries_groupby_labels_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_2->thread_0/summaries_groupby_labels_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_2\n", + "\n", + "thread_0/summaries_groupby_labels_2\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_3->thread_0/summaries_groupby_labels_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_3->thread_0/summaries_groupby_labels_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_2\n", + "\n", + "thread_0/reduce_to_pairs_2\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_2->thread_0/reduce_to_pairs_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_compare_2\n", + "\n", + "thread_0/google_model_compare_2\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_2->thread_0/google_model_compare_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_3\n", + "\n", + "thread_0/summaries_groupby_labels_3\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_4->thread_0/summaries_groupby_labels_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_4->thread_0/summaries_groupby_labels_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_3\n", + "\n", + "thread_0/reduce_to_pairs_3\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_3->thread_0/reduce_to_pairs_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_compare_3\n", + "\n", + "thread_0/google_model_compare_3\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_3->thread_0/google_model_compare_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_4\n", + "\n", + "thread_0/summaries_groupby_labels_4\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_5->thread_0/summaries_groupby_labels_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_5->thread_0/summaries_groupby_labels_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_4\n", + "\n", + "thread_0/reduce_to_pairs_4\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_4->thread_0/reduce_to_pairs_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_compare_4\n", + "\n", + "thread_0/google_model_compare_4\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_4->thread_0/google_model_compare_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_5\n", + "\n", + "thread_0/summaries_groupby_labels_5\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_6->thread_0/summaries_groupby_labels_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_6->thread_0/summaries_groupby_labels_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_5\n", + "\n", + "thread_0/reduce_to_pairs_5\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_5->thread_0/reduce_to_pairs_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_compare_5\n", + "\n", + "thread_0/google_model_compare_5\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_5->thread_0/google_model_compare_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_6\n", + "\n", + "thread_0/summaries_groupby_labels_6\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_7->thread_0/summaries_groupby_labels_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_7->thread_0/summaries_groupby_labels_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_6\n", + "\n", + "thread_0/reduce_to_pairs_6\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_6->thread_0/reduce_to_pairs_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_compare_6\n", + "\n", + "thread_0/google_model_compare_6\n", + "\n", + "\n", + "\n", + "thread_0/reduce_to_pairs_6->thread_0/google_model_compare_6\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_8\n", + "\n", + "thread_0/google_model_label_8\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_8->thread_0/google_model_label_8\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_8\n", + "\n", + "thread_0/google_model_summary_8\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_8->thread_0/google_model_summary_8\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_9\n", + "\n", + "thread_0/google_model_label_9\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_9->thread_0/google_model_label_9\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_9\n", + "\n", + "thread_0/google_model_summary_9\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_9->thread_0/google_model_summary_9\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_10\n", + "\n", + "thread_0/google_model_label_10\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_10->thread_0/google_model_label_10\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_10\n", + "\n", + "thread_0/google_model_summary_10\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_10->thread_0/google_model_summary_10\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_11\n", + "\n", + "thread_0/google_model_label_11\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_11->thread_0/google_model_label_11\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_11\n", + "\n", + "thread_0/google_model_summary_11\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_11->thread_0/google_model_summary_11\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_12\n", + "\n", + "thread_0/google_model_label_12\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_12->thread_0/google_model_label_12\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_12\n", + "\n", + "thread_0/google_model_summary_12\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_12->thread_0/google_model_summary_12\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_13\n", + "\n", + "thread_0/google_model_label_13\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_13->thread_0/google_model_label_13\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_13\n", + "\n", + "thread_0/google_model_summary_13\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_13->thread_0/google_model_summary_13\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_14\n", + "\n", + "thread_0/google_model_label_14\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_14->thread_0/google_model_label_14\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_14\n", + "\n", + "thread_0/google_model_summary_14\n", + "\n", + "\n", + "\n", + "thread_0/split_to_chunks_14->thread_0/google_model_summary_14\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_7\n", + "\n", + "thread_0/summaries_groupby_labels_7\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_8->thread_0/summaries_groupby_labels_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_8->thread_0/summaries_groupby_labels_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_7->thread_0/reduce_to_pairs_1\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_9->thread_0/summaries_groupby_labels_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_9->thread_0/summaries_groupby_labels_7\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_8\n", + "\n", + "thread_0/summaries_groupby_labels_8\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_10->thread_0/summaries_groupby_labels_8\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_10->thread_0/summaries_groupby_labels_8\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_8->thread_0/reduce_to_pairs_2\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_9\n", + "\n", + "thread_0/summaries_groupby_labels_9\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_11->thread_0/summaries_groupby_labels_9\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_11->thread_0/summaries_groupby_labels_9\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_9->thread_0/reduce_to_pairs_3\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_10\n", + "\n", + "thread_0/summaries_groupby_labels_10\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_12->thread_0/summaries_groupby_labels_10\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_12->thread_0/summaries_groupby_labels_10\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_10->thread_0/reduce_to_pairs_4\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_11\n", + "\n", + "thread_0/summaries_groupby_labels_11\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_13->thread_0/summaries_groupby_labels_11\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_13->thread_0/summaries_groupby_labels_11\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_11->thread_0/reduce_to_pairs_5\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_12\n", + "\n", + "thread_0/summaries_groupby_labels_12\n", + "\n", + "\n", + "\n", + "thread_0/google_model_label_14->thread_0/summaries_groupby_labels_12\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/google_model_summary_14->thread_0/summaries_groupby_labels_12\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "thread_0/summaries_groupby_labels_12->thread_0/reduce_to_pairs_6\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(graph)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "uniflow", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/example/transform/openai_paper_comparison_model.ipynb b/example/transform/openai_paper_comparison_model.ipynb new file mode 100644 index 00000000..16b7c735 --- /dev/null +++ b/example/transform/openai_paper_comparison_model.ipynb @@ -0,0 +1,299 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7cbc4c4a", + "metadata": {}, + "source": [ + "# Notebook for Paper Comparison Flow via OpenAI\n", + "In this example, we will show you how to generate question-answers (QAs) from a pdf using OpenAI's models via `uniflow`'s [OpenAIJsonModelFlow](https://github.com/CambioML/uniflow/blob/main/uniflow/flow/model_flow.py#L125).\n", + "\n", + "\n", + "### Before running the code\n", + "\n", + "You will need to `uniflow` conda environment to run this notebook. You can set up the environment following the instruction: https://github.com/CambioML/uniflow/tree/main#installation.\n", + "\n", + "Next, you will need a valid [OpenAI API key](https://platform.openai.com/api-keys) to run the code. Once you have the key, set it as the environment variable `OPENAI_API_KEY` within a `.env` file in the root directory of this repository. For more details, see this [instruction](https://github.com/CambioML/uniflow/tree/main#api-keys)\n", + "\n", + "In this example, we'll be using two papers in markdown format from under 'example/transform/data/raw_input/'" + ] + }, + { + "cell_type": "markdown", + "id": "a3ce3754", + "metadata": {}, + "source": [ + "### Update system path" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "172a856a", + "metadata": {}, + "outputs": [], + "source": [ + "%reload_ext autoreload\n", + "%autoreload 2\n", + "\n", + "import sys\n", + "\n", + "sys.path.append(\".\")\n", + "sys.path.append(\"..\")\n", + "sys.path.append(\"../..\")" + ] + }, + { + "cell_type": "markdown", + "id": "a594b4c3", + "metadata": {}, + "source": [ + "### Install helper packages" + ] + }, + { + "cell_type": "markdown", + "id": "7d84aefd", + "metadata": {}, + "source": [ + "### Import Dependency" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "id": "8d84dd70", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from dotenv import load_dotenv\n", + "\n", + "from uniflow.flow.flow_factory import FlowFactory\n", + "from uniflow.flow.client import TransformClient\n", + "from uniflow.flow.config import TransformOpenAIConfig\n", + "from uniflow.op.model.model_config import OpenAIModelConfig\n", + "from uniflow.op.prompt import Context\n", + "\n", + "load_dotenv()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "id": "2340ddee", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'extract': ['ExtractHTMLFlow',\n", + " 'ExtractImageFlow',\n", + " 'ExtractIpynbFlow',\n", + " 'ExtractMarkdownFlow',\n", + " 'ExtractPDFFlow',\n", + " 'ExtractTxtFlow',\n", + " 'ExtractGmailFlow'],\n", + " 'transform': ['TransformAzureOpenAIFlow',\n", + " 'TransformCopyFlow',\n", + " 'TransformGoogleFlow',\n", + " 'TransformGoogleMultiModalModelFlow',\n", + " 'TransformHuggingFaceFlow',\n", + " 'TransformLMQGFlow',\n", + " 'TransformOpenAIFlow',\n", + " 'TransformSummaryGoogleFlow',\n", + " 'TransformComparisonOpenAIFlow'],\n", + " 'rater': ['RaterFlow']}" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "FlowFactory.list()" + ] + }, + { + "cell_type": "markdown", + "id": "cb677037", + "metadata": {}, + "source": [ + "### Prepare the input data\n", + "They are in preprocessed in markdown formats" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "fc6f290c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['# The Impact of Artificial Intelligence on Healthcare: A Review\\n\\n## Abstract\\nArtificial intelligence (AI) has emerged as a transformative technology in healthcare, offering opportunities to improve patient outcomes, streamline processes, and enhance decision-making. This paper provides an overview of the current state of AI in healthcare, explores its applications, benefits, challenges, and future prospects.\\n\\n## Introduction\\nIn recent years, artificial intelligence has gained significant traction across various industries, and healthcare is no exception. With advancements in machine learning, natural language processing, and robotics, AI has the potential to revolutionize healthcare delivery, diagnosis, treatment, and management. This paper aims to delve into the role of AI in healthcare, highlighting its implications, challenges, and future directions.\\n\\n## Background\\nThe integration of AI into healthcare systems has been facilitated by the exponential growth of data, coupled with the development of sophisticated algorithms. Machine learning algorithms, such as deep learning, support vector machines, and random forests, enable healthcare providers to analyze large datasets, identify patterns, and extract actionable insights. Furthermore, natural language processing techniques empower AI systems to interpret and generate human language, facilitating tasks such as clinical documentation, medical coding, and patient communication.\\n\\n## Approach\\nTo examine the impact of AI on healthcare, we conducted a comprehensive literature review, analyzing research articles, industry reports, and case studies. We focused on key applications of AI in healthcare, including disease diagnosis, personalized treatment planning, drug discovery, remote patient monitoring, and predictive analytics. Additionally, we explored the challenges associated with AI adoption in healthcare, such as data privacy concerns, regulatory barriers, algorithm bias, and interoperability issues.\\n\\n## Experiment/Result\\nOur analysis revealed that AI holds immense promise for transforming healthcare delivery and improving patient outcomes. AI-powered diagnostic systems demonstrate high accuracy and efficiency in detecting various medical conditions, ranging from cancer and cardiovascular diseases to infectious diseases and neurological disorders. Moreover, AI-driven predictive analytics enable healthcare providers to anticipate disease outbreaks, optimize resource allocation, and enhance population health management. Despite these advancements, several challenges hinder the widespread adoption of AI in healthcare, including data quality issues, algorithmic bias, ethical considerations, and regulatory constraints.\\n\\n## Conclusion/Future Work\\nLooking ahead, future research should focus on addressing the technical, ethical, and regulatory challenges associated with AI in healthcare. Efforts to enhance the interpretability, fairness, and transparency of AI algorithms are critical to building trust among healthcare professionals and patients. Moreover, interdisciplinary collaboration between computer scientists, healthcare professionals, policymakers, and ethicists is essential to develop robust frameworks for AI governance and ensure responsible AI deployment in healthcare settings. Additionally, longitudinal studies are needed to assess the long-term impact of AI on patient outcomes, healthcare costs, and healthcare disparities. By addressing these challenges and leveraging the full potential of AI, we can unlock new opportunities for advancing healthcare delivery, enhancing clinical decision-making, and ultimately improving the quality of care for patients worldwide.\\n',\n", + " '# The Impact of Renewable Energy Adoption on Global Carbon Emissions: An Analytical Study\\n\\n## Abstract\\nThis paper examines the impact of renewable energy adoption on global carbon emissions. With climate change posing a significant threat to the environment and human societies, transitioning to renewable energy sources has become a crucial global initiative. This study analyzes the correlation between increased use of renewable energy sources, such as wind, solar, and hydro, and the subsequent changes in carbon emissions worldwide. Utilizing data from various countries over the past two decades, we employ statistical models to assess the effectiveness of renewable energy in reducing carbon footprints. Our findings suggest that renewable energy adoption is a viable strategy for significantly reducing global carbon emissions, highlighting the need for policies that support renewable energy investments and infrastructure development.\\n\\n## Introduction\\nClimate change remains one of the most pressing challenges of our time, with carbon emissions from fossil fuel consumption being a primary contributor. The transition to renewable energy sources is widely viewed as a vital step towards mitigating climate change impacts. This paper explores the effectiveness of renewable energy adoption in reducing global carbon emissions. By examining data from multiple countries, we aim to provide a comprehensive analysis of how renewable energy usage influences carbon emission trends and to evaluate the potential of renewable energy as a sustainable solution to climate change.\\n\\n## Background\\nThe relationship between human activities, especially the burning of fossil fuels, and climate change is well-documented. Renewable energy sources offer an alternative that does not emit carbon dioxide during operation, thus presenting a potential pathway to decarbonize the energy sector. Governments and organizations worldwide have made commitments to increase the share of renewables in their energy mix. This paper builds on existing research by analyzing more recent data to understand the current impact of renewable energy adoption on carbon emissions.\\n\\n## Approach\\nOur approach involves collecting and analyzing data on renewable energy consumption and carbon emissions from various countries over the last twenty years. We focus on wind, solar, and hydroelectric power due to their significant growth and potential for large-scale implementation. The study employs statistical analysis methods to identify trends, correlations, and causations between the adoption of renewable energy and changes in carbon emissions. We adjust for factors such as economic growth, population changes, and energy efficiency improvements to isolate the impact of renewable energy.\\n\\n## Experiment/Result\\nThe analysis reveals a clear negative correlation between the adoption of renewable energy sources and carbon emissions in countries with aggressive renewable energy policies. For instance, countries that have doubled their renewable energy consumption in the past decade have seen, on average, a 10% reduction in carbon emissions, even after accounting for economic and population growth. These findings are consistent across developed and developing nations, suggesting that renewable energy can be an effective tool for reducing carbon emissions globally.\\n\\n## Conclusion/Future Work\\nThe study confirms that renewable energy adoption plays a crucial role in reducing global carbon emissions. The findings support the need for policies and investments that encourage the development and deployment of renewable energy technologies. Future work should focus on longitudinal studies to track the long-term impact of renewable energy adoption on carbon emissions. Additionally, further research is needed to explore the socio-economic benefits of transitioning to renewable energy, such as job creation, health improvements, and energy security, to provide a more comprehensive understanding of its impacts.\\n\\n\\n']" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "with open(r\"data/raw_input/paper_1_raw.md\", 'r') as file:\n", + " paper_1_content = file.read()\n", + "\n", + "with open(r\"data/raw_input/paper_2_raw.md\", 'r') as file:\n", + " paper_2_content = file.read()\n", + "\n", + "raw_context_input = [\n", + " paper_1_content,\n", + " paper_2_content,\n", + "]\n", + "raw_context_input" + ] + }, + { + "cell_type": "markdown", + "id": "a3180ff3", + "metadata": {}, + "source": [ + "### Use LLM to compare two papers\n", + "\n", + "In this example, we will use the [OpenAIModelConfig](https://github.com/CambioML/uniflow/blob/main/uniflow/model/config.py#L17)'s default LLM to generate questions and answers." + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "c3e75e65", + "metadata": {}, + "outputs": [], + "source": [ + "input_data = [[Context(Context=raw_context_input[0]), Context(Context=raw_context_input[1])]]\n", + "config = TransformOpenAIConfig(\n", + " flow_name=\"TransformComparisonOpenAIFlow\",\n", + " model_config=OpenAIModelConfig(),\n", + ")\n", + "client = TransformClient(config)" + ] + }, + { + "cell_type": "markdown", + "id": "06b94c94", + "metadata": {}, + "source": [ + "Now we call the `run` method on the `client` object to execute the question-answer generation operation on the data shown above." + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "id": "2103149e", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 1/1 [01:05<00:00, 65.77s/it]\n" + ] + } + ], + "source": [ + "output = client.run(input_data)" + ] + }, + { + "cell_type": "markdown", + "id": "aef58403", + "metadata": {}, + "source": [ + "### Process the output\n", + "\n", + "Let's take a look of the generated output. We need to do a little postprocessing on the raw output." + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "id": "07af0669", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Similarities between paper A and paper B:\\n1. Both papers focus on the impact of renewable energy utilization on global carbon emissions in response to climate change.\\n2. Both papers emphasize the need to transition from non-renewable to renewable energy sources to reduce carbon emissions.\\n3. They both highlight the importance of renewable energy sources such as solar, wind, and hydroelectric power as potential solutions to mitigate climate change.\\n\\nDifferences between paper A and paper B:\\n1. Paper A specifically mentions the use of statistical models to analyze data from various countries over the past 20 years to assess the effectiveness of renewable energy in reducing carbon footprints, while paper B does not mention specific research methods.\\n2. Paper A focuses on the findings that suggest adopting renewable energy can significantly reduce global carbon emissions, emphasizing the importance of policies that support renewable energy investments and infrastructure development, whereas paper B does not mention specific findings or policy implications.',\n", + " 'The similarities between paper A and paper B are as follows:\\n\\n1. Both papers highlight the potential of artificial intelligence (AI) to revolutionize the healthcare industry and improve patient outcomes.\\n\\n2. They both acknowledge the need for further research to fully understand the benefits and challenges of integrating AI into healthcare systems.\\n\\n3. Both papers discuss specific applications of AI in healthcare, such as medical imaging analysis, drug discovery, personalized medicine, and patient care management.\\n\\n4. They both address the potential challenges and limitations of AI in healthcare, including data privacy concerns, algorithm bias, and the need for ongoing validation and regulation of AI technologies.\\n\\n5. Both papers emphasize the importance of ethical considerations, regulatory frameworks, and ongoing evaluation of AI technologies to ensure their safe and effective implementation in healthcare settings.\\n\\nThe main difference between the two papers is that paper A provides a comprehensive review of the current state of AI in healthcare, while paper B aims to delve into the role of AI in healthcare, emphasizing its implications, challenges, and future directions. Additionally, paper B specifically mentions the advancements in machine learning, natural language processing, and robotics as potential game-changers in healthcare, which is not explicitly mentioned in paper A.',\n", + " 'Similarities:\\n- Both papers discuss the potential benefits of integrating AI into healthcare and the impact of renewable energy adoption on global carbon emissions.\\n- Both papers highlight the importance of addressing challenges and limitations in their respective fields, such as the need for large, high-quality datasets in AI integration and the lack of consensus in existing research on the impact of renewable energy adoption.\\n\\nDifferences:\\n- Paper A focuses on the integration of AI into healthcare systems and discusses specific applications of AI in disease identification, personalized treatment planning, drug discovery, and predictive analytics. In contrast, paper B focuses on the impact of renewable energy adoption on global carbon emissions, outlining the objectives, methodology, and expected contributions of the study.\\n- While both papers discuss challenges and limitations, the specific challenges discussed in each paper are different. Paper A discusses challenges related to data privacy, security, regulatory compliance, and the \"black box\" nature of AI algorithms. Paper B focuses on the lack of consensus and mixed findings in existing research on the impact of renewable energy adoption on carbon emissions, as well as the different methodologies and approaches used in previous research.',\n", + " 'Similarities:\\n- Both papers involve a comprehensive literature review and analysis of various sources to gain insights into their respective topics.\\n- Both papers aim to provide a deeper understanding of the impact and challenges related to the adoption of a specific technology (AI in healthcare for paper A, renewable energy for paper B).\\n\\nDifferences:\\n- Paper A focuses on the impact of AI on healthcare, including applications such as disease diagnosis, personalized treatment planning, and drug discovery, as well as challenges related to AI adoption in healthcare.\\n- Paper B, on the other hand, discusses the relationship between human activities and climate change, with a specific focus on the potential of renewable energy sources to reduce carbon emissions. It also aims to present empirical evidence related to the impact of renewable energy adoption on carbon emissions.',\n", + " 'Similarities:\\n1. Both papers focus on the potential benefits of advanced technology (AI in paper A, renewable energy in paper B) in specific fields (healthcare in paper A, carbon emissions in paper B).\\n2. Both papers acknowledge the challenges and obstacles that need to be addressed in order to fully utilize the potential benefits of the technology being studied.\\n\\nDifferences:\\n1. Paper A focuses on the potential of AI in healthcare delivery and patient outcomes, while paper B focuses on the impact of renewable energy usage on carbon emissions.\\n2. Paper A discusses the specific application of AI in diagnostic systems and predictive analytics for disease management, while paper B discusses the analysis of data on renewable energy usage and carbon emissions from different countries.\\n3. Paper A highlights challenges related to data quality, algorithmic bias, ethical considerations, and regulatory constraints, while paper B accounts for factors such as economic growth, population changes, and advancements in energy efficiency in its analysis.',\n", + " 'Similarities:\\n- Both papers call for future research and efforts to address challenges and make improvements in their respective fields.\\n- They both emphasize the need for further studies to understand the impact of their research topics (AI in healthcare and renewable energy sources) on global issues (healthcare outcomes and carbon emissions).\\n\\nDifferences:\\n- Paper A focuses on the need for interdisciplinary collaboration between computer scientists, healthcare professionals, policymakers, and ethicists, while Paper B focuses on the analysis of the relationship between renewable energy consumption and carbon emissions in different countries.\\n- Paper A discusses the importance of improving the interpretability, fairness, and transparency of AI algorithms, while Paper B presents findings on the effectiveness of renewable energy in reducing carbon emissions on a global scale.']" + ] + }, + "execution_count": 101, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "comparison = []\n", + "\n", + "for item in output:\n", + " for i in item.get('output', []):\n", + " for response in i.get('response', []):\n", + " comparison.append(response)\n", + "\n", + "comparison" + ] + }, + { + "cell_type": "markdown", + "id": "a2942fd0", + "metadata": {}, + "source": [ + "## End of the notebook\n", + "\n", + "Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!\n", + "\n", + "\n", + " \n", + "" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tests/op/basic/test_group_op.py b/tests/op/basic/test_group_op.py new file mode 100644 index 00000000..f8165384 --- /dev/null +++ b/tests/op/basic/test_group_op.py @@ -0,0 +1,53 @@ +"""Test cases for GroupOp.""" + +import unittest + +from uniflow.node import Node +from uniflow.op.basic.group_op import GroupOp +from uniflow.op.prompt import Context + + +class TestGroupOp(unittest.TestCase): + def setUp(self): + self.preprocess_fn = lambda nodes_1, nodes_2: [ + ( + node_label.value_dict["response"][0], + node_summary.value_dict["response"][0], + ) + for node_label, node_summary in zip(nodes_1, nodes_2) + ] + self.group_fn = lambda labels, summaries: { + label: [s for l, s in zip(labels, summaries) if l == label] + for label in set(labels) + } + self.group_op = GroupOp("test_group", self.preprocess_fn, self.group_fn) + + def test_init(self): + self.assertEqual(self.group_op._preprocess_fn, self.preprocess_fn) + self.assertEqual(self.group_op._fn, self.group_fn) + + def test_call(self): + node_a0 = Node("node_a0", {"response": ["Introduction"]}) + node_a1 = Node("node_a1", {"response": ["Introduction"]}) + node_a2 = Node("node_a2", {"response": ["Abstract"]}) + + node_b0 = Node("node_b0", {"response": ["A paper about life itself"]}) + node_b1 = Node("node_b1", {"response": ["Life is complicated"]}) + node_b2 = Node("node_b2", {"response": ["Happy wife, happy life"]}) + + nodes_1 = [node_a0, node_a1, node_a2] + nodes_2 = [node_b0, node_b1, node_b2] + output_nodes = self.group_op(nodes_1, nodes_2) + + self.assertEqual(len(output_nodes), 2) + self.assertEqual( + output_nodes[0].value_dict, [Context(context="Happy wife, happy life")] + ) + self.assertEqual( + output_nodes[1].value_dict, + [Context(context="A paper about life itself Life is complicated")], + ) + + +if __name__ == "__main__": + unittest.main() diff --git a/uniflow/flow/transform/__init__.py b/uniflow/flow/transform/__init__.py index 0d0613d1..bd3bcd78 100644 --- a/uniflow/flow/transform/__init__.py +++ b/uniflow/flow/transform/__init__.py @@ -8,6 +8,12 @@ from uniflow.flow.transform.transform_azure_openai_flow import ( # noqa: F401, F403 TransformAzureOpenAIFlow, ) +from uniflow.flow.transform.transform_comparison_google_flow import ( # noqa: F401, F403 + TransformComparisonGoogleFlow, +) +from uniflow.flow.transform.transform_comparison_openai_flow import ( # noqa: F401, F403 + TransformComparisonOpenAIFlow, +) from uniflow.flow.transform.transform_copy_flow import ( # noqa: F401, F403 TransformCopyFlow, ) @@ -35,4 +41,6 @@ "TransformAzureOpenAIFlow", "TransformGoogleFlow", "TransformGoogleMultiModalModelFlow", + "TransformComparisonGoogleFlow", + "TransformComparisonOpenAIFlow", ] diff --git a/uniflow/flow/transform/transform_comparison_google_flow.py b/uniflow/flow/transform/transform_comparison_google_flow.py new file mode 100644 index 00000000..51bfb434 --- /dev/null +++ b/uniflow/flow/transform/transform_comparison_google_flow.py @@ -0,0 +1,172 @@ +"""Paper Comparison Google Model Flow Module.""" + +import re +from typing import Any, Dict, Sequence + +from uniflow.constants import TRANSFORM +from uniflow.flow.flow import Flow +from uniflow.node import Node +from uniflow.op.basic.expand_op import ExpandOp +from uniflow.op.basic.group_op import GroupOp +from uniflow.op.basic.reduce_op import ReduceOp +from uniflow.op.model.lm.model import LmModel +from uniflow.op.model.model_op import ModelOp +from uniflow.op.prompt import Context, PromptTemplate + + +class GoogleComparisonFlow(Flow): + """Google Compariosn Flow Class.""" + + def __init__( + self, + prompt_template: PromptTemplate, + model_config: Dict[str, Any], + ) -> None: + """Google Compariosn Flow Constructor. + + Args: + prompt_template (PromptTemplate): Guided prompt template. + model_config (Dict[str, Any]): Model config. + """ + # TODO: Refactoring needed to make model_op output Context format. Need to keep it in Context format and only convert back to dictionary format before exiting Flow + super().__init__() + + # Expand list of nodes to two or more nodes + self._expand_from_papers = ExpandOp( + name="expand_to_paper_node_from_nodes", + fn=lambda x: [[x[0][i]] for i in range(len(x[0]))], + ) + + # Split into chunks + self._expand_to_chunks = ExpandOp( + name="split_to_chunks", + fn=lambda markdown_content: [ + [Context(context=item.strip())] + for item in re.split(r"\n\s*\n", markdown_content[0].Context) + if item.strip() + ], + ) + + # TODO: Refactoring needed to make model_op output Context format + # Add label + label_prompt_template = PromptTemplate( + instruction=""" + Assume you're a research scientist and are reading a research paper. Classify the paragraph to be one of following catgeories: + "1-Abstract", "2-Introduction", "3-Background", "4-Approach", "5-Experiment or Result", "6-Conclusion or Future work" + """, + ) + + self._model_label = ModelOp( + name="google_model_label", + model=LmModel( + prompt_template=label_prompt_template, + model_config=model_config, + ), + ) + + # TODO: Refactoring needed to make model_op output Context format + # Summarize + summary_prompt_template = PromptTemplate( + instruction=""" + Assume you're a research scientist and are reading a research paper. + Please provide a detailed and comprehensive summary of each paragrph in the essay. + """, + ) + + self._model_summary = ModelOp( + name="google_model_summary", + model=LmModel( + prompt_template=summary_prompt_template, + model_config=model_config, + ), + ) + + # Group summaries by label + self._group = GroupOp( + name="summaries_groupby_labels", + preprocss_fn=lambda nodes_1, nodes_2: [ + ( + node_label.value_dict["response"][0], + node_summary.value_dict["response"][0], + ) + for node_label, node_summary in zip(nodes_1, nodes_2) + ], + fn=lambda labels, summaries: { + label: [s for l, s in zip(labels, summaries) if l == label] + for label in set(labels) + }, + given_fixed_labels=[ + "1-Abstract", + "2-Introduction", + "3-Background", + "4-Approach", + "5-Experiment or Result", + "6-Conclusion or Future work", + ], + ) + + # Reduce pair chunks from each paper into list of nodes + self._reduce_op = ReduceOp( + name="reduce_to_pairs", + fn=lambda list1, list2: [ + Context(context=f"paper A: {a.context}, paper B: {b.context}") + for a, b in zip(list1, list2) + ], + ) + + # Compare + compare_prompt_template = PromptTemplate( + instruction=""" + Assume you're a research scientist and are reading two research papers. + Compare between paper A and paper B. Note their similarities and differences if applicable. + """, + ) + + self._model_compare = ModelOp( + name="google_model_compare", + model=LmModel( + prompt_template=compare_prompt_template, + model_config=model_config, + ), + ) + + def run(self, nodes: Sequence[Node]) -> Sequence[Node]: + """Run Model Flow. + + Args: + nodes (Sequence[Node]): Nodes to run. + + Returns: + Sequence[Node]: Nodes after running. + """ + paper1_node, paper2_node = self._expand_from_papers(nodes[0]) + + paper1_node_chunks = self._expand_to_chunks(paper1_node) + paper2_node_chunks = self._expand_to_chunks(paper2_node) + + paper1_node_chunks_labels = self._model_label(paper1_node_chunks) + paper1_node_chunks_summaries = self._model_summary(paper1_node_chunks) + + paper2_node_chunks_labels = self._model_label(paper2_node_chunks) + paper2_node_chunks_summaries = self._model_summary(paper2_node_chunks) + + paper1_node_grouped = self._group( + paper1_node_chunks_labels, paper1_node_chunks_summaries + ) + paper2_node_grouped = self._group( + paper2_node_chunks_labels, paper2_node_chunks_summaries + ) + + combined_nodes = [] + for node_1, node_2 in zip(paper1_node_grouped, paper2_node_grouped): + combined_nodes.append(self._reduce_op([(node_1, node_2)])[0]) + + # TODO: add a model to fine fune overall comparison if needed + + return self._model_compare(combined_nodes) + + +class TransformComparisonGoogleFlow(GoogleComparisonFlow): + """Transform Google Flow Class.""" + + TAG = TRANSFORM diff --git a/uniflow/flow/transform/transform_comparison_openai_flow.py b/uniflow/flow/transform/transform_comparison_openai_flow.py new file mode 100644 index 00000000..0f88c3e8 --- /dev/null +++ b/uniflow/flow/transform/transform_comparison_openai_flow.py @@ -0,0 +1,212 @@ +"""Paper Comparison OpenAI Model Flow Module.""" + +import re +from typing import Any, Dict, Sequence + +from uniflow.constants import TRANSFORM +from uniflow.flow.flow import Flow +from uniflow.node import Node +from uniflow.op.basic.expand_op import ExpandOp +from uniflow.op.basic.group_op import GroupOp +from uniflow.op.basic.reduce_op import ReduceOp +from uniflow.op.model.lm.model import JsonLmModel, LmModel +from uniflow.op.model.model_op import ModelOp +from uniflow.op.prompt import Context, PromptTemplate + + +class OpenAIComparisonFlow(Flow): + """OpenAI Model Flow Class.""" + + def __init__( + self, + prompt_template: PromptTemplate, + model_config: Dict[str, Any], + ) -> None: + """OpenAI Model Flow Constructor. + + Args: + prompt_template (PromptTemplate): Guided prompt template. + model_config (Dict[str, Any]): Model config. + """ + # TODO: Refactoring needed to make model_op output Context format. Need to keep it in Context format and only convert back to dictionary format before exiting Flow + super().__init__() + if model_config["response_format"]["type"] == "json_object": + model = JsonLmModel( + prompt_template=prompt_template, + model_config=model_config, + ) + else: + model = LmModel( + prompt_template=prompt_template, + model_config=model_config, + ) + + # Expand list of nodes to two or more nodes + self._expand_from_papers = ExpandOp( + name="expand_to_paper_node_from_nodes", + fn=lambda x: [[x[0][i]] for i in range(len(x[0]))], + ) + + # Split into chunks + self._expand_to_chunks = ExpandOp( + name="split_to_chunks", + fn=lambda markdown_content: [ + [Context(context=item.strip())] + for item in re.split(r"\n\s*\n", markdown_content[0].Context) + if item.strip() + ], + ) + + # TODO: Refactoring needed to make model_op output Context format + # Add label + label_prompt_template = PromptTemplate( + instruction=""" + Assume you're a research scientist and are reading a research paper. You must classify it to be exactly one of the categories below, no extra wording. Classify the paragraph to be one of following catgeories: + 'label: 1-Abstract', 'label: 2-Introduction', 'label: 3-Background', 'label: 4-Approach', 'label: 5-Experiment or Result', 'label: 6-Conclusion or Future work'. + If you are really unsure about it or don't have access to the content, just strictly return 'label: 3-Background'. Follow the example below. + """, + few_shot_prompt=[ + Context( + context="This study investigates the efficacy of mindfulness meditation in reducing stress levels among college students. A sample of 100 undergraduate students was randomly assigned to either a mindfulness meditation group or a control group. The mindfulness group underwent an eight-week mindfulness meditation program, while the control group received no intervention. Stress levels were measured using standardized self-report scales before and after the intervention period. Results indicate a significant reduction in perceived stress levels among participants in the mindfulness group compared to the control group. These findings suggest that mindfulness meditation may serve as an effective strategy for stress reduction among college students, highlighting its potential benefits for mental health promotion in academic settings.", + label="1-Abstract", + ), + Context( + context="Coral bleaching events pose a significant threat to the health and biodiversity of coral reef ecosystems, with climate change identified as a primary driver of these phenomena. This study utilizes satellite imagery and climate data to analyze the relationship between sea surface temperature anomalies and coral bleaching occurrences in the Great Barrier Reef (GBR) over the past two decades. Our results reveal a strong correlation between increased sea surface temperatures and the frequency and severity of coral bleaching events in the GBR region. Furthermore, projections based on climate models suggest a continued escalation of these events in the coming years under current emission scenarios. These findings underscore the urgent need for targeted conservation efforts and mitigation strategies to safeguard the long-term resilience of coral reef ecosystems in the face of climate change.", + label="1-Abstract", + ), + Context( + context="The global imperative to mitigate climate change and transition towards sustainable energy sources has propelled the rapid expansion of renewable energy generation. While renewable technologies offer immense potential for decarbonizing the electricity sector, their intermittent nature and spatial variability pose significant challenges to grid stability and reliability.", + label="2-Introduction", + ), + Context( + context="Author XYZ", + label="3-Background", + ), + Context( + context="XYZ University", + label="3-Background", + ), + Context( + context="Figure 3 illustrates the mean values of the dependent variable for each group, with the experimental group showing a noticeable improvement compared to the control group. Moreover, correlation analysis revealed a strong positive relationship between the treatment dosage and the improvement level (r = 0.78, p < 0.01), further substantiating the hypothesis. These results are consistent with the theoretical framework proposed, suggesting that the intervention directly contributes to the observed outcomes.", + label="5-Experiment or Result", + ), + Context( + context="In conclusion, the findings from this study provide substantial evidence supporting the hypothesis that the intervention significantly improves the outcome measures compared to the control. The statistical analysis, indicating both significance and a strong positive correlation between treatment dosage and effect size, underscores the potential of the intervention for practical applications. ", + label="6-Conclusion or Future work", + ), + ], + ) + self._model_label = ModelOp( + name="openai_model_label", + model=LmModel( + prompt_template=label_prompt_template, + model_config=model_config, + ), + ) + + # TODO: Refactoring needed to make model_op output Context format + # Summarize + summary_prompt_template = PromptTemplate( + instruction=""" + Assume you're a research scientist and are reading a research paper. + Please provide a detailed and comprehensive summary of each paragrph in the essay. Ignore insignificant and minor details such as emails. + """, + ) + + self._model_summary = ModelOp( + name="openai_model_summarize", + model=LmModel( + prompt_template=summary_prompt_template, + model_config=model_config, + ), + ) + + # Group summaries by label + self._group = GroupOp( + name="summaries_groupby_labels", + preprocss_fn=lambda nodes_1, nodes_2: [ + ( + node_label.value_dict["response"][0], + node_summary.value_dict["response"][0], + ) + for node_label, node_summary in zip(nodes_1, nodes_2) + ], + fn=lambda labels, summaries: { + label: [s for l, s in zip(labels, summaries) if l == label] + for label in set(labels) + }, + given_fixed_labels=[ + "label: 1-Abstract", + "label: 2-Introduction", + "label: 3-Background", + "label: 4-Approach", + "label: 5-Experiment or Result", + "label: 6-Conclusion or Future work", + ], + ) + + # Reduce pair chunks from each paper into list of nodes + self._reduce_op = ReduceOp( + name="reduce_to_pairs", + fn=lambda list1, list2: [ + Context(context=f"paper A: {a.context}, paper B: {b.context}") + for a, b in zip(list1, list2) + ], + ) + + # Compare + compare_prompt_template = PromptTemplate( + instruction=""" + Assume you're a research scientist and are reading two research papers. Ignore insignificant and minor details such as emails. + Compare between paper A and paper B. Note their similarities and differences if applicable. + """, + ) + + self._model_compare = ModelOp( + name="openai_model_compare_chunks", + model=LmModel( + prompt_template=compare_prompt_template, + model_config=model_config, + ), + ) + + def run(self, nodes: Sequence[Node]) -> Sequence[Node]: + """Run Model Flow. + + Args: + nodes (Sequence[Node]): Nodes to run. + + Returns: + Sequence[Node]: Nodes after running. + """ + paper1_node, paper2_node = self._expand_from_papers(nodes[0]) + + paper1_node_chunks = self._expand_to_chunks(paper1_node) + paper2_node_chunks = self._expand_to_chunks(paper2_node) + + paper1_node_chunks_labels = self._model_label(paper1_node_chunks) + paper1_node_chunks_summaries = self._model_summary(paper1_node_chunks) + + paper2_node_chunks_labels = self._model_label(paper2_node_chunks) + paper2_node_chunks_summaries = self._model_summary(paper2_node_chunks) + + paper1_node_grouped = self._group( + paper1_node_chunks_labels, paper1_node_chunks_summaries + ) + paper2_node_grouped = self._group( + paper2_node_chunks_labels, paper2_node_chunks_summaries + ) + + combined_nodes = [] + for node_1, node_2 in zip(paper1_node_grouped, paper2_node_grouped): + combined_nodes.append(self._reduce_op([(node_1, node_2)])[0]) + + # TODO: add a model to fine fune overall comparison if needed + + return self._model_compare(combined_nodes) + + +class TransformComparisonOpenAIFlow(OpenAIComparisonFlow): + """Transform Comparison OpenAI Flow Class.""" + + TAG = TRANSFORM diff --git a/uniflow/op/basic/group_op.py b/uniflow/op/basic/group_op.py new file mode 100644 index 00000000..01159546 --- /dev/null +++ b/uniflow/op/basic/group_op.py @@ -0,0 +1,97 @@ +"""Group operation module.""" + +from typing import Any, Callable, Mapping, Optional, Sequence + +from uniflow.node import Node +from uniflow.op.op import Op +from uniflow.op.prompt import Context + + +class GroupOp(Op): + """Group Operation.""" + + def __init__( + self, + name: str, + preprocss_fn: Callable[ + [Mapping[str, Any], Mapping[str, Any]], Mapping[str, Any] + ], + fn: Callable[[Mapping[str, Any], Mapping[str, Any]], Mapping[str, Any]], + given_fixed_labels: Optional[list] = None, + ) -> None: + """Initializes group operation. + + Args: + name (str): Name of the group operation. + preprocss_fn (callable): Function to extract. + fn (callable): Function to group. + given_fixed_labels (Optional[list]) : A list of fixed, provided labels to help handle exceptions if there are no content for certain labels + """ + super().__init__(name) + self._fn = fn + self._preprocess_fn = preprocss_fn + self._given_fixed_labels = given_fixed_labels if given_fixed_labels else [] + + def __call__( + self, nodes_1: Sequence[Node], nodes_2: Sequence[Node] + ) -> Sequence[Node]: + """Calls group operation. + The (preprocss_fn) preprocess function will first extract information such as label and summary out of node's dictionary. + Then (fn) function would groub by summaries based on their labels. + The result would be a list of nodes where each node's dictionary is a sum of summaries of nodes with same label. + If given_fixed_labels is provided, labels with no summaries will still be included in the result. + + Args: + nodes_1 (Sequence[Node]), nodes_2 (Sequence[Node]): Input two nodes. + + Returns: + Sequence[Node]: Output nodes. + """ + output_nodes = [] + + labels, summaries = zip(*self._preprocess_fn(nodes_1, nodes_2)) + aggregated_summaries = self._fn(labels, summaries) + sorted_labels = sorted(aggregated_summaries.keys()) + + # Exception handling for missing sections (no summaries are given such label) + if self._given_fixed_labels: + for label in self._given_fixed_labels: + if label not in sorted_labels: + sorted_labels.append(label) + sorted_labels.sort() + + label_nodes = {label: [] for label in sorted_labels} + + for node in nodes_1: + label = node.value_dict["response"][0] + if label in label_nodes: + label_nodes[label].append(node) + + for label in sorted_labels: + try: + summary_list = aggregated_summaries[label] + combined_summary = " ".join(summary_list) + value_dict = [Context(context=combined_summary)] + + prev_nodes = label_nodes[label] + + for node in nodes_2: + if node.value_dict["response"][0] in summary_list: + prev_nodes.append(node) + + output_nodes.append( + Node( + name=self.unique_name(), + value_dict=value_dict, + prev_nodes=prev_nodes, + ) + ) + except: # label with empty summaries + output_nodes.append( + Node( + name=self.unique_name(), + value_dict=[Context(context="")], + ) + ) + + return output_nodes diff --git a/uniflow/op/basic/transform_op.py b/uniflow/op/basic/transform_op.py index 0b423048..df709845 100644 --- a/uniflow/op/basic/transform_op.py +++ b/uniflow/op/basic/transform_op.py @@ -1,6 +1,6 @@ """Transform operation module.""" -from typing import Any, Callable, Mapping, Sequence, Tuple +from typing import Any, Callable, Mapping, Sequence from uniflow.node import Node from uniflow.op.op import Op @@ -23,7 +23,7 @@ def __init__( super().__init__(name) self._fn = fn - def __call__(self, nodes: Sequence[Tuple[Node, Node]]) -> Sequence[Node]: + def __call__(self, nodes: Sequence[Node]) -> Sequence[Node]: """Calls transform operation. Args: