Skip to content

Commit 54b5bfa

Browse files
Thomas HoffmannianyfanDimitri KartsaklisnikhilkhatriCharles London
committed
Release version 0.3.0
Co-authored-by: Ian Fan <ian.fan@quantinuum.com> Co-authored-by: Dimitri Kartsaklis <dimitri.kartsaklis@quantinuum.com> Co-authored-by: Nikhil Khatri <nikhil.khatri@quantinuum.com> Co-authored-by: Charles London <charles.london@quantinuum.com> Co-authored-by: Richie Yeung <richie.yeung@quantinuum.com>
1 parent 70a1fe8 commit 54b5bfa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

97 files changed

+4677
-766
lines changed

.github/workflows/build_test.yml

+8-9
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
runs-on: ubuntu-latest
1818
strategy:
1919
matrix:
20-
python-version: [ 3.8, 3.9, "3.10" ]
20+
python-version: [ 3.8, 3.9, "3.10", "3.11" ]
2121
outputs:
2222
error-check: ${{ steps.error-check.conclusion }}
2323
steps:
@@ -47,16 +47,13 @@ jobs:
4747
runs-on: ubuntu-latest
4848
strategy:
4949
matrix:
50-
python-version: [ 3.8, 3.9, "3.10" ]
50+
python-version: [ 3.8, 3.9, "3.10", "3.11" ]
5151
steps:
5252
- uses: actions/checkout@v3
5353
- name: Setup Python ${{ matrix.python-version }}
5454
uses: actions/setup-python@v4
5555
with:
5656
python-version: ${{ matrix.python-version }}
57-
- name: Install DisCoPy 0.5 from GitHub
58-
if: github.ref_name != 'release' && github.ref_name != 'beta'
59-
run: pip install git+https://github.com/discopy/discopy@0.5
6057
- name: Install base package
6158
run: pip install .
6259
- name: Check package import works
@@ -65,7 +62,7 @@ jobs:
6562
run: pip install .[extras] .[test]
6663
- name: Locate bobcat pre-trained model cache
6764
id: loc-bobcat-cache
68-
run: echo "::set-output name=dir::$(python -c 'from lambeq.text2diagram.bobcat_parser import get_model_dir; print(get_model_dir("bert"))')"
65+
run: echo "dir=$(python -c 'from lambeq.text2diagram.model_downloader import ModelDownloader; print(ModelDownloader("bert").model_dir)')" >> $GITHUB_OUTPUT
6966
- name: Restore bobcat pre-trained model from cache
7067
id: bobcat-cache
7168
uses: actions/cache@v2
@@ -82,18 +79,20 @@ jobs:
8279
--ignore=docs/extract_code_cells.py
8380
- name: Determine if depccg tests should be run
8481
# only test depccg if it is explicitly changed, since it is very slow
82+
# tests are also disabled on Python 3.11
8583
id: depccg-enabled
8684
continue-on-error: true # this is expected to fail but the job should still succeed
8785
run: >
88-
git fetch --depth=1 origin ${{ github.base_ref || github.event.before }}:before
86+
${{ matrix.python-version != '3.11' }}
87+
&& git fetch --depth=1 origin ${{ github.base_ref || github.event.before }}:before
8988
&& git diff --name-only before | grep depccg
9089
- name: Install depccg and locate depccg pre-trained model cache
9190
id: loc-depccg-cache
9291
if: steps.depccg-enabled.outcome == 'success'
9392
run: |
9493
pip install cython # must be installed before depccg
9594
pip install depccg==2.0.3.2
96-
echo "::set-output name=dir::$(python -c 'from depccg.instance_models import MODEL_DIRECTORY, MODELS; print(MODEL_DIRECTORY / MODELS["en"][1])')"
95+
echo "dir=$(python -c 'from depccg.instance_models import MODEL_DIRECTORY, MODELS; print(MODEL_DIRECTORY / MODELS["en"][1])')" >> $GITHUB_OUTPUT
9796
pip install lambeq # override dependency conflicts
9897
- name: Restore depccg pre-trained model from cache
9998
id: depccg-cache
@@ -117,7 +116,7 @@ jobs:
117116
runs-on: ubuntu-latest
118117
strategy:
119118
matrix:
120-
python-version: [ 3.8, 3.9, "3.10" ]
119+
python-version: [ 3.8, 3.9, "3.10", "3.11" ]
121120
steps:
122121
- uses: actions/checkout@v3
123122
- name: Setup Python ${{ matrix.python-version }}

.github/workflows/docs.yml

-2
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,6 @@ jobs:
3131
pip install -r docs/requirements.txt
3232
- name: Build documentation
3333
run: ${{ env.WORKFLOWS_DIR }}/build-docs
34-
- name: Move install script
35-
run: mv install.sh docs/_build/html
3634
- name: Deploy documentation
3735
if: ${{ github.event_name == 'push' && (github.ref_name == 'main' || github.ref_name == 'release') }}
3836
uses: s0/git-publish-subdir-action@develop

docs/conf.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626

2727
project = 'lambeq'
28-
copyright = '2021-2022 Cambridge Quantum Computing Ltd.'
28+
copyright = '2021-2023 Cambridge Quantum Computing Ltd.'
2929
author = 'Cambridge Quantum QNLP Dev Team'
3030

3131
# -- General configuration ---------------------------------------------------
@@ -34,9 +34,9 @@
3434
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
3535
# ones.
3636
extensions = [
37-
'm2r2',
3837
'nbsphinx',
3938
'numpydoc',
39+
'sphinx_mdinclude',
4040
'sphinx.ext.autodoc',
4141
'sphinx.ext.viewcode',
4242
'sphinx.ext.graphviz',

docs/examples/classical_pipeline.ipynb

+11-13
Large diffs are not rendered by default.

docs/examples/pennylane.ipynb

+675
Large diffs are not rendered by default.

docs/examples/quantum_pipeline.ipynb

+23-23
Large diffs are not rendered by default.

docs/examples/quantum_pipeline_jax.ipynb

+29-18
Large diffs are not rendered by default.

docs/examples/tokenisation.ipynb

+181
Large diffs are not rendered by default.

docs/glossary.rst

+4-1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ Glossary
5959
IQP circuit
6060
Instantaneous Quantum Polynomial. A circuit which interleaves layers of Hadamard :term:`quantum gates <quantum gate>` with diagonal unitaries.
6161

62+
loss function
63+
In machine learning, a function that estimates how far the prediction of a :term:`model` is from its true value. The purpose of training is to minimise the loss over the training set.
64+
6265
matrix product state (MPS)
6366
A factorization of a large tensor into a chain-like product of smaller tensors. ``lambeq`` is equipped with :term:`ansätze <ansatz (plural: ansätze)>` that implement various forms of matrix product states, allowing the execution of large :term:`tensor networks <tensor network>` on classical hardware.
6467

@@ -81,7 +84,7 @@ Glossary
8184
A statistical tool that converts a sentence into a hierarchical representation that reflects the syntactic relationships between the words (a :term:`syntax tree`) based on a specific grammar formalism.
8285

8386
PennyLane
84-
A Python library for differentiable programming of quantum computers, developed by Xanadu, enabling quantum machine learning.
87+
A Python library for differentiable programming of quantum computers, developed by Xanadu, enabling quantum machine learning. See more `here <https://pennylane.ai/qml/>`_.
8588

8689
post-selection
8790
The act of conditioning the probability space on a particular event. In practice, this involves disregarding measurement outcomes where a particular qubit does not match the post-selected value.

docs/index.rst

+12
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ User support
2323

2424
If you need help with ``lambeq`` or you think you have found a bug, please send an email to lambeq-support@cambridgequantum.com. You can also open an issue at ``lambeq``'s `GitHub repository <https://github.com/CQCL/lambeq>`_. Someone from the development team will respond to you as soon as possible. Furthermore, if you want to subscribe to ``lambeq``'s mailing list (lambeq-users@cambridgequantum.com), send an email to lambeq-support@cambridgequantum.com to let us know.
2525

26+
Note that the best way to get in touch with the QNLP community and learn about ``lambeq`` is to join our `QNLP discord server <https://discord.gg/TA63zghMrC>`_, where you can ask questions, get notified about important announcements and news, and chat with other QNLP researchers.
27+
2628
Licence
2729
-------
2830

@@ -54,6 +56,16 @@ If you use ``lambeq`` for your research, please cite the accompanying paper [Kea
5456
use-cases
5557
CONTRIBUTING
5658

59+
.. toctree::
60+
:caption: NLP-101
61+
:maxdepth: 2
62+
63+
nlp-intro
64+
nlp-data
65+
nlp-class
66+
nlp-ml
67+
nlp-refs
68+
5769
.. toctree::
5870
:caption: Tutorials
5971
:maxdepth: 2

docs/models.rst

+56
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,62 @@ To use the :py:class:`.NumpyModel` with ``jit`` mode, you need to install ``lamb
4747

4848
- :ref:`uc1`
4949

50+
.. _sec-pennylanemodel:
51+
52+
PennyLaneModel
53+
--------------
54+
55+
:py:class:`.PennyLaneModel` uses :term:`PennyLane` and :term:`PyTorch` to allow classical-quantum machine learning experiments. With ``probabilities=False``, :py:class:`.PennyLaneModel` performs a state vector simulation, while with ``probabilties=True`` it performs a probability simulation. The state vector and probability simulations correspond to DisCoPy's unitary and density matrix simulations.
56+
57+
To run the model on real quantum hardware, ``probabilities=True`` must be used, so that the ``lambeq`` circuits are optimized using the parameter-shift rule to calculate the gradients.
58+
59+
:py:class:`.PennyLaneModel` can be used to optimize simulated circuits using exact backpropagation with PyTorch, which may give improved results over using :py:class:`.NumpyModel` with :py:class:`.SPSAOptimizer`. However, this optimization process is not possible on real quantum hardware, so for more realistic results the parameter-shift rule should be preferred.
60+
61+
To construct a hybrid model that passes the output of a circuit through a classical neural network, it is only necessary to subclass :py:class:`.PennyLaneModel` and modify the :py:meth:`~.PennyLaneModel.__init__` method to store the classical PyTorch parameters, and the :py:meth:`~.PennyLaneModel.forward` method to pass the result of :py:meth:`~.PennyLaneModel.get_diagram_output` to the neural network. For example:
62+
63+
.. code-block:: python
64+
65+
import torch
66+
from lambeq import PennyLaneModel
67+
68+
class MyCustomModel(PennyLaneModel):
69+
def __init__(self, **kwargs):
70+
super().__init__(**kwargs)
71+
self.net = torch.nn.Linear(2, 2)
72+
73+
def forward(self, input):
74+
preds = self.get_diagram_output(input)
75+
return self.net(preds)
76+
77+
This neural net can be real- or complex-valued, though this affects the non-linearities that can be used.
78+
79+
:py:class:`.PennyLaneModel` can be used with the :py:class:`.PytorchTrainer`, or a standard PyTorch training loop.
80+
81+
By using different backend configurations, :py:class:`.PennyLaneModel` can be used for several different use-cases, listed below:
82+
83+
.. _tbl-plane-usecases:
84+
.. csv-table:: Backend configurations for different use cases.
85+
:header: "Use case", "Configurations"
86+
:widths: 25, 50
87+
88+
"Exact non :term:`shot-based <shots>` simulation with state outputs", "``{'backend': 'default.qubit', 'probabilities'=False}``"
89+
"Exact non shot-based simulation with probability outputs", "``{'backend': 'default.qubit', 'probabilities'=True}``"
90+
"Noiseless shot-based simulation", "``{'backend': 'default.qubit', 'shots'=1000, 'probabilities'=True}``"
91+
"Noisy shot-based simulation on local hardware", "``{'backend': 'qiskit.aer', noise_model=my_noise_model, 'shots'=1000, 'probabilities'=True}``, where ``my_noise_model`` is an AER :py:class:`NoiseModel`."
92+
"Noisy shot-based simulation on cloud-based emulators", "| ``{'backend': 'qiskit.ibmq', 'device'='ibmq_qasm_simulator', 'shots'=1000, 'probabilities'=True}``
93+
| ``{'backend': 'honeywell.hqs', device=('H1-1E' or 'H1-2E'), 'shots'=1000, 'probabilities'=True}``"
94+
"Evaluation of quantum circuits on a quantum computer", "| ``{'backend': 'qiskit.ibmq', 'device'='ibmq_hardware_device', 'shots'=1000, 'probabilities'=True}``, where ``ibmq_hardware_device`` is one that you have access to via your IBMQ account.
95+
| ``{'backend': 'honeywell.hqs', device=('H1' or 'H1-1' or 'H1-2'), 'shots'=1000, 'probabilities'=True}``"
96+
97+
All of these backends are compatible with hybrid quantum-classical models. Note that using quantum hardware or cloud-based emulators are much slower than local simulations.
98+
99+
.. rubric:: See also the following use cases:
100+
101+
- :ref:`uc1`
102+
- :ref:`uc2`
103+
- :ref:`uc3`
104+
- :ref:`uc5`
105+
50106
.. _sec-pytorchmodel:
51107

52108
PytorchModel

docs/nlp-class.rst

+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
Text classification
2+
===================
3+
4+
One of the most fundamental tasks in NLP is text classification, which involves categorising textual data into predefined categories. It plays a vital role in a variety of NLP applications, including sentiment analysis, spam detection, topic modeling, and language identification, among others. By categorising texts into relevant categories, machines can analyse and derive insights from large volumes of textual data, making it possible to automate decision-making processes and perform tasks that would otherwise be time-consuming or impossible for humans to do.
5+
6+
Binary vs multi-class classification
7+
------------------------------------
8+
9+
Binary classification and multi-class classification involve assigning a label or category to an input data point. In `binary classification`, there are only two possible output categories, and the goal is to classify input data points into one of these two categories. For example, classifying emails as spam or not spam.
10+
11+
On the other hand, `multi-class classification` involves assigning a data point to one of more than two possible output categories. For example, classifying images of animals into categories such as cats, dogs, and birds.
12+
13+
Multi-class classification problems can be further divided into two subcategories: multi-class `single-label` classification and multi-class `multi-label` classification. In multi-class single-label classification, each input data point is assigned to one and only one output category. In contrast, in multi-class multi-label classification, each input data point can be assigned to one or more output categories simultaneously.
14+
15+
In general, binary classification is a simpler and more straightforward problem to solve than multi-class classification, but multi-class classification problems are more representative of real-world scenarios where there are multiple possible categories to that a data point could belong.
16+
17+
Loss functions
18+
--------------
19+
20+
For binary classification tasks, the loss function of choice is binary cross-entropy. Below, :math:`y_i` is the true label for the :math:`i` th data point, :math:`p(y_i)` represents the probability that the model assigns to the specific label, and :math:`N` is the number of data points.
21+
22+
.. math::
23+
24+
H(p, q) = -\frac{1}{N}\sum_{i=1}^N [y_i \log(p(y_i)) + (1-y_i) \log(1-p(y_i))]
25+
26+
For multi-class classification, the loss function is usually the categorical version of cross-entropy. Here, :math:`M` is the number of classes, :math:`p(x_i)` is the true probability for the :math:`i` th class, and :math:`q(x_i)` the probability predicted by the model.
27+
28+
.. math::
29+
30+
H(p, q) = -\sum_{i=1}^M p(x_i) \log(q(x_i))
31+
32+
.. note::
33+
34+
``lambeq`` provides a number of loss functions that can be used out-of-the-box during training, such as :py:class:`~.BinaryCrossEntropyLoss`, :py:class:`~.CrossEntropyLoss`, and :py:class:`~.MSELoss`.
35+
36+
.. _sec-evaluation:
37+
38+
Evaluation metrics
39+
------------------
40+
41+
The most common metrics to evaluate the performance of classification models is accuracy, precision, recall, and F-score. Each metric has its own strengths and weaknesses, and can be useful in different contexts.
42+
43+
- `Accuracy` is usually the standard way to evaluate classification, and it measures how often the model correctly predicts the class of an instance. It is calculated as the ratio of correct predictions to the total number of predictions. This metric can be useful when the classes in the dataset are balanced, meaning that there are roughly equal numbers of instances in each class. In this case, accuracy can provide a good overall measure of how well the model is performing.
44+
45+
.. math::
46+
\text{Accuracy} = \frac{\text{True Positives} + \text{True Negatives}}{\text{True Positives} + \text{True Negatives} + \text{False Positives} + \text{False Negatives}}
47+
48+
- `Precision` is the proportion of true positive predictions among all positive predictions. It is expressed as the ratio of true positives to the total number of instances that the model predicts as positive. Precision is useful when the cost of false positives is high, such as in spam filtering or legal decision making.
49+
50+
.. math::
51+
52+
\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
53+
54+
- `Recall`, also known as `sensitivity`, is the proportion of true positive predictions among all actual positive instances in the dataset. Recall is calculated as the ratio of true positives to the total number of instances of that class. It can be helpful when the goal of the model is to identify all instances of a particular class, such as in medical diagnosis or fraud detection.
55+
56+
.. math::
57+
58+
\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
59+
60+
These two measures can be competing in the sense that increasing precision can decrease recall and vice versa. This trade-off occurs because precision and recall measure different aspects of the model's performance. High precision means that the model is accurate in its positive predictions, but it may miss some true positive instances, leading to lower recall. On the other hand, high recall means that the model identifies most of the positive instances, but it may have more false positives, leading to lower precision.
61+
62+
To address this, researchers use `F-score`, also known as the `F1` score, which is a combined measure of precision and recall. It is calculated as the harmonic mean of precision and recall and provides a way to balance these two metrics. F-score is useful when both precision and recall are important and can be used to compare models that have different tradeoffs between these two metrics.
63+
64+
.. math::
65+
66+
\text{F-score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
67+
68+
.. note::
69+
70+
For examples of text classification with ``lambeq``, see the :ref:`Training tutorial <sec-training>`.

0 commit comments

Comments
 (0)