Skip to content
This repository was archived by the owner on Mar 24, 2025. It is now read-only.
/ syntree-generator Public archive

A tool for converting French literary text into S-expression syntax trees for linguistic analysis, with visualization capabilities

Notifications You must be signed in to change notification settings

jwalsh/syntree-generator

Repository files navigation

syntree-generator

Syntree Generator

A tool for converting French literary text into S-expression syntax trees for linguistic analysis, with visualization capabilities.

./static/screenshots/syntax-tree-ui.png

Overview

Syntree Generator converts natural language text into structured Abstract Syntax Trees (ASTs) represented in S-expression format. It’s particularly optimized for analyzing French literary texts, such as Proust’s works, by mapping syntactic dependencies to constituent structure.

The tool breaks down sentences into their grammatical components (nouns, verbs, phrases, clauses) and generates formal representations that can be visualized and studied. It includes a web UI for interactive exploration of the syntax trees and supports extracting samples for use with external visualization tools.

Features

  • Converts text to constituency-based syntax trees in S-expression format
  • Optimized for French literary texts with special focus on complex syntactic structures
  • Web-based visualization of syntax trees
  • Integration with spaCy for linguistic analysis
  • Sample extraction for use with external tools
  • Command-line interface for batch processing
  • Configurable chunking for processing large texts
  • Emacs integration for syntax highlighting and advanced tree visualization

Installation

The project uses Poetry for dependency management.

# Clone the repository
git clone https://github.com/jwalsh/syntree-generator.git
cd syntree-generator

# Install dependencies
make setup

Usage

Basic usage

To parse a text file and generate S-expressions:

# Using the Python CLI directly
python -m syntree_generator.cli path/to/input.txt -o path/to/output.lisp

# Or using make
make run INPUT_FILE=data/pg15288.txt OUTPUT_FILE=output/proust.lisp

Getting samples

To generate a sample of S-expressions that can be easily loaded into the S-expression Grammar Analyzer:

make samples SAMPLE_SIZE=5

This will create a file with the extension .sample.lisp containing 5 sample S-expressions.

Visualization

To view the syntax trees in the web UI:

make serve
# Then visit http://localhost:8765 in your browser

Advanced Emacs Integration

For users with Emacs, the project provides enhanced syntax highlighting and visualization:

# Copy the provided .emacs.d/init.el or load publish.el
emacs -l publish.el

# Then open any .lisp file in the examples directory
# Use C-c t to visualize the tree structure

Literary Texts

The repository includes org-mode scripts to download various French literary texts for testing and analysis:

# Download all texts
make download-texts

# Process specific texts
make run INPUT_FILE=data/pg2650.txt OUTPUT_FILE=output/swann.lisp
make run INPUT_FILE=data/pg6099.txt OUTPUT_FILE=output/baudelaire.lisp

Examples

The examples/ directory contains S-expression examples at various complexity levels:

  • Simple examples with basic subject-verb structures
  • Medium examples with prepositional phrases and modifiers
  • Complex examples with relative/subordinate clauses
  • Very complex examples typical of Proust’s style

Documentation

The documentation is available in the org-mode files and can be published to HTML:

# Generate all documentation
make docs

# Serve and view the documentation
make serve
# Then visit http://localhost:8765 in your browser

Development

Running Tests

make test

Code Formatting

make format

Capturing Screenshots

# Setup shot-scraper
./setup-shot-scraper.sh

# Capture screenshots of the web UI
make screenshots

License

MIT License

Copyright (c) 2025 Jason Walsh

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.