Skip to content

Update CMake build and dlib version #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

peter277
Copy link
Contributor

@peter277 peter277 commented Mar 28, 2025

This pull request modernises MITIE's build process by improving shared library compilation, adding pkg-config and CMake package support, and modularising CMake configuration with new optionality, making packaging and distribution more flexible while preserving original behaviour by default.

mitielib : fix shared library compilation for C++ API

  • Add MITIE_API prefixes to C++ functions and classes to ensure symbols are exported to shared libraries
  • Rename MITIE_EXPORT->MITIE_API in mitie.h header
  • Convert const variables in conll_parser and ner_feature_extraction headers to constexpr

CMake build process uplift

  • Create CMakeLists for overarching project root folder and modularise build options (aiming to set defaults which align to original behaviour)
  • Update mitielib CMakeLists to assume it is called from parent project CMakeLists and implement new build options
  • Ability to build against shared dlib (set option MITIE_INTEGRATED_DLIB=OFF)
  • Add ability to prepend mitie- prefix to tools and examples executable output files
  • Configure CMake install directives for all targets: MITIE library and headers, C/C++ based tools and examples
  • Build options for examples:
    • Separate build options for C and C++ examples
    • Option to switch _example suffix on/off
  • Omit lib prefix on Windows MinGW

dlib

  • merge_changes_from_dlib : add latest version tag detection logic

…name clashes when using later versions of C++ standard
* Add MITIE_API prefixes to C++ functions and classes to ensure symbols are exported to shared libraries
* Rename MITIE_EXPORT->MITIE_API in mitie.h header
* Convert const variables in conll_parser and ner_feature_extraction headers to constexpr
* Create CMakeLists for overarching project root folder and modularise build options (aiming to set defaults which align to original behaviour)
* Update mitielib CMakeLists to assume it is called from parent project CMakeLists and implement new build options
* Add ability to prepend mitie- prefix to tools and examples executable output files
…option to switch _example suffix on/off, error if system-installed dlib not found when MITIE_INTEGRATED_DLIB=OFF, omit lib prefix on Windows MinGW
@peter277 peter277 force-pushed the update-cmake-build-and-dlib-version branch from ff47ba2 to 24f860c Compare March 30, 2025 04:44
@davisking
Copy link
Contributor

Thanks for the PR, but I don't want to change a bunch of stuff that isn't causing users problems. Especially build system scripts. Those have worked for years across many platforms. Updates to stuff like that tends to break some subset of users. The same goes for updating dlib. MITIE doesn't need the updated dlib so updating it won't make MITIE better but might break some users using older compilers.

And the PR is way too big :)

peter277 added 2 commits April 5, 2025 19:47
Add directives for mitie.pc, mitie-config.cmake and mitie-version.cmake metadata generation and install rules for package discovery
@peter277 peter277 force-pushed the update-cmake-build-and-dlib-version branch from 24f860c to 13ed622 Compare April 5, 2025 08:52
@peter277
Copy link
Contributor Author

peter277 commented Apr 5, 2025

Thanks for the PR, but I don't want to change a bunch of stuff that isn't causing users problems. Especially build system scripts. Those have worked for years across many platforms. Updates to stuff like that tends to break some subset of users. The same goes for updating dlib. MITIE doesn't need the updated dlib so updating it won't make MITIE better but might break some users using older compilers.

And the PR is way too big :)

Thanks for taking the time to review the pull request. I've removed the dlib version update to reduce noise.

I've been compiling MITIE in an MSYS2 Windows environment while packaging and installing it as a shared library (alongside the tools and examples). Part of the changes are fixes to errors I was experiencing, such as linking errors with C++ tools and examples due to missing symbol exports. To resolve this, I introduced an MITIE_API macro and applied it to the relevant C++ classes. Example errors:

# Example error when linking C++ examples with shared MITIE
[14/14] Linking CXX executable train_relation_extraction_example.exe
FAILED: train_relation_extraction_example.exe
...
undefined reference to `mitie::binary_relation_detector_trainer::binary_relation_detector_trainer
...
undefined reference to `mitie::binary_relation_detector_trainer::add_positive_binary_relation
...[etc]

Other changes are primarily optional modifications that aim to preserve the current behaviour by default. I can separate these changes into smaller pull requests if that would be preferable. Could you please advise on which modifications you consider most viable for merging?

For example, I assume fixing the C++ shared library prefixes is viable since current code breaks certain compilation scenarios (like on my MSYS2 Windows setup), however what about:

  • adding options for 'mitie-' prefix for tools and examples executables and/or suppressing '_example' suffix (useful if distributing executables as part of a package);
  • option to look for and link against shared system dlib (e.g. MSYS2 already has dlib);
  • adding pkg-config and CMake Config (alternatively could be shipped with MSYS2 PKGBUILD instead of being merged into MITIE master);
  • CMake install directives for tools and executables;
  • a central mitie_project CMakeLists.txt to enable configuring and building the library, tools and examples in one step.

Additionally, it might be beneficial to establish some guidelines on the minimum supported tooling and language versions for future development. For example, could we consider requiring C++11 (or newer, if not already the minimum) and setting a minimum CMake version of 3.10 or 3.12? This would help eliminate the deprecation warnings we currently see with CMake 2.4/2.6, while ensuring that those who depend on older configurations can still use the existing stable releases.

@davisking
Copy link
Contributor

davisking commented Apr 5, 2025

Well, why does the C++ stuff need to be exposed? MITIE is designed specifically to hide those C++ symbols and expose only a C API, which is then used via python or whatever when built as a shared library. Making shared libraries that expose C++ symbols is extremely complicated and basically doesn't work with normal C++ code on some platforms (e.g. mainly Windows, where just adding export statements to the classes is not going to work, the whole C++ API needs to be designed around the goal of making it work, which it isn't here.).

Related to that, why are you trying to build C++ code with the MITIE code compiled into a shared library rather than building them the recommended way?

@peter277
Copy link
Contributor Author

peter277 commented Apr 10, 2025

I'm packaging MITIE for MSYS2 and wanting to use it in that environment, where libraries are generally compiled into DLLs (e.g. MSYS2's dlib is shared). It would also be more consistent with behaviour on Linux platforms where the symbols are exported fully by default in the current configuration. Additionally it also keeps the package with tools and examples smaller, but if there are strong reasons to keep the library static then I suppose this would probably be acceptable. I do note that the recommendation to build as static when using the C++ API is not clearly documented in the readme file and mitielib's CMakeLists.txt is configured to build as shared by default.

I am aware that on Windows C++ DLLs generally need to be recompiled for each compiler, and there are potentially name mangling and ABI incompatibilities. However, MITIE is open source and MSYS2 is a relatively self-contained ecosystem, so this doesn't seem like a huge problem. Interested to know your views on whether there are deeper design constraints beyond this that need to be considered that might make a shared library fundamentally unsafe or unmanageable?

Would an opt-in approach through a CMake option MITIE_BUILD_SHARED_CPP_API (defaulting to OFF) help address your concerns?

Most of the other suggestions could be deferred to the packaging process, but the fix for the C++ shared API would be beneficial to incorporate into the master repo (I'm not a fan of enabling an option to export all symbols like on Linux). Additionally, the option to build against a system-installed dlib would ideally be included as well.

@davisking
Copy link
Contributor

The problem though is on windows, dlls (ones made with visual studio at least, I'm not sure what MSYS2 ends up doing) have these memory allocation boundaries. So for instance, you can't have a function in a dll that's does this:

std::vector<int> make_me_a_vector();

since what happens is someone calls make_me_a_vector(), it runs the code in your dll right. That dll has its own separate instance of new that uses a separate data structure to manage the memory new manages. So you call it, in your application, which gets a std::vector. Then in your application at some point you are done with the std::vector you got from make_me_a_vector() and your code runs std::vector::~vector() which calls delete on the memory. But that's using the delete in your dll (or application) that is going to try and put the memory block back into a memory management structure that isn't the one it was allocated from. At which point your application crashes or goes haywire.

Linux systems aren't setup this way, so shared libraries there are easy and work fine. But on windows, C++ shared libraries are hugely problematic because of this. You have to somehow construct the API such that client code never itself deletes memory that was allocated in your dll. That's the issue and it requires a carefully designed API to avoid that. Which hardly anyone does. Dlib certainly doesn't do that.

The main MITIE API though, the C API, is explicitly designed for this so that you can use it as a dll. It's how the python tooling uses MITIE for instance. So that works well. But trying to get the C++ code in MITIE To work inside a dll is inevitably going to be a disaster because of this "separate memory allocation regions" thing windows dlls do.

It's also why symbol export is off by default in most dll creation tools. Since it is so wildly dangerous and buggy to try and use C++ symbols across dll boundaries, specifically due to this issue.

@davisking
Copy link
Contributor

davisking commented Apr 12, 2025

Windows DLLs are such a disaster. I was looking for some more reference on this, there used to be a good article that explained it but I'm not finding it. But for instance, https://devblogs.microsoft.com/oldnewthing/20060915-04/?p=29723 https://devblogs.microsoft.com/oldnewthing/20060915-04/?p=29723, https://docs.omniverse.nvidia.com/kit/docs/carbonite/latest/docs/memory/Memory.html etc. You can find a lot written about it.

The TLDR is that any C++ code that is going to be the API for a DLL needs to be very carefully constructed around this limitation of DLLs. Normal C++ code can't be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants