Tiago Tresoldi

Letters. Programming. Evolution. &c.

I develop open-source tools for computational linguistics and data analysis. All projects are available on GitHub.

Python Libraries

nhandu

Literate programming for Python: write executable documents in plain .py files. Transforms Python files with markdown comments into reproducible reports with captured output, plots, and rich object rendering. Git-friendly alternative to Jupyter notebooks.

Repository: https://github.com/tresoldi/nhandu PyPI: https://pypi.org/project/nhandu/

dafsa

Deterministic Acyclic Finite State Automaton library for efficient string matching and morphological analysis.

Repository: https://github.com/tresoldi/dafsa PyPI: https://pypi.org/project/dafsa/

ngesh

Library for generating synthetic phylogenetic data for testing and validation of computational methods.

Repository: https://github.com/tresoldi/ngesh PyPI: https://pypi.org/project/ngesh/

freqprob

Tools for computing frequency probabilities in linguistic datasets.

Repository: https://github.com/tresoldi/freqprob

asymcat

Statistical analysis of asymmetric categorical data with applications in typological research.

Repository: https://github.com/tresoldi/asymcat

C Tools

acopost

Part-of-speech tagging system implemented in C for efficient text processing.

Repository: https://github.com/tresoldi/acopost

Data Resources

Arca Verborum

Comprehensive lexical database for computational historical linguistics. Analysis-ready datasets from Lexibank with 149 collections covering 2.9 million language forms across 9,700+ languages. Provides denormalized CSV files optimized for rapid method development and teaching. Project page

Repository: https://github.com/tresoldi/arcaverborum

Philosophy

My software development follows these principles:

Installation & Usage

Most Python packages can be installed via pip:

pip install package-name

See individual repositories for specific installation instructions and usage examples.

SOFTWARE - Tiago Tresoldi