Apache StormCrawler (Incubating) Migration Guide
Introduction
This guide provides step-by-step instructions for migrating your project from older versions of StormCrawler to the new version under the Apache umbrella. Key changes include updates to the group and artifact IDs, as well as the removal of the Elasticsearch module.
Group ID and Artifact ID Changes
Group ID
The group ID has changed from com.digitalpebble.stormcrawler
to org.apache.stormcrawler
. This change reflects the project's transition to the Apache Software Foundation.
Artifact ID
The artifact ID has changed from storm-crawler
to stormcrawler
.
Maven Configuration
Update your pom.xml
to reflect these changes. Below is an example of the updated dependency configuration:
Old Configuration:
<dependency>
<groupId>com.digitalpebble.stormcrawler</groupId>
<artifactId>storm-crawler</artifactId>
<version>OLD_VERSION</version>
</dependency>
New Configuration:
<dependency>
<groupId>org.apache.stormcrawler</groupId>
<artifactId>stormcrawler</artifactId>
<version>NEW_VERSION</version>
</dependency>
Replace OLD_VERSION
with the version you are currently using and NEW_VERSION
with the latest version of Apache StormCrawler.
Removal of Elasticsearch Module
The Elasticsearch module has been removed in the latest version of StormCrawler. You have two options to handle this change:
- Fork the Elasticsearch Module: You can fork the Elasticsearch module from the older version of StormCrawler and maintain it independently.
- Migrate to OpenSearch Module: Alternatively, you can migrate your code to use the OpenSearch module provided by Apache StormCrawler.
Forking the Elasticsearch Module
- Clone the repository containing the last version of StormCrawler that includes the Elasticsearch module.
- Copy the Elasticsearch module into your project's repository.
- Update your project's dependencies to include this local version of the Elasticsearch module.
Migrating to OpenSearch Module
If you choose to migrate to the OpenSearch module, you will need to update your code to use the new module. Here are the steps:
- Add the OpenSearch dependency to your
pom.xml
:
<dependency>
<groupId>org.apache.stormcrawler</groupId>
<artifactId>stormcrawler-opensearch</artifactId>
<version>NEW_VERSION</version>
</dependency>
- Update your code to replace any references to the Elasticsearch module with the corresponding OpenSearch module classes and methods. The API for OpenSearch is similar to Elasticsearch, so the changes should be straightforward.
- Test your application thoroughly to ensure that the migration does not introduce any issues.
Summary
By following the steps outlined in this guide, you should be able to migrate your project to the latest version of Apache StormCrawler with minimal effort. Ensure you update your Maven dependencies and handle the removal of the Elasticsearch module by either forking it or migrating to the OpenSearch module.
For any further assistance, reach out to the community via mailing lists or GitHub discussions.