The goal of this task is to perform a web crawl on a URL string provided by the user. From the crawl, you will need to parse out all of the images on that web page and return a JSON array of strings that represent the URLs of all images on the page. [Jsoup](https://jsoup.org/) is a great basic library for crawling and is already included as a maven dependency in this project, however you are welcome to use whatever library you would like.
Required Functionality
We expect your submission to be able to achieve the following goals:
- Build a web crawler that can find all images on the web page(s) that it crawls.
- Crawl sub-pages to find more images.
- Implement multi-threading so that the crawl can be performed on multiple pages at a time.
- Keep your crawl within the same domain as the input URL.
- Avoid re-crawling any pages that have already been visited.
Extra Functionality
No individual point below is explicitly required, but we recommend trying to achieve the following goals as well:
- Make your crawler "friendly" - try not to get banned from the site by performing too many crawls.
- Try to detect what images might be considered logos.
- Show off your front-end dev skills with Javascript, HTML, and/or CSS to make the site look more engaging.
- Any other way you feel you can show off your strengths as a developer ????
PLEASE do not send us a submission with only a basic JSoup crawl and only a couple lines of code. This is your chance to prove what you could contribute to our team.
Your project will be due exactly 48 hours after you receive this project. To submit, zip up your project (`imagefinder.zip`) and email it back to me. Please include a list of URLs that you used to test in your submissions. You should place them in the attached `test-links.txt` file found in the root of this project.
Structure
The ImageFinder servlet is found in `src/main/java/com/eulerity/hackathon/imagefinder/ImageFinder.java`. This is the only provided Java class. Feel free to add more classes or packages as you see fit.
The main landing page for this project can be found in `src/main/webapp/index.html`. This page contains more instructions and serves as the starting page for the web application. You may edit this page as much as it suits you, and/or add other pages.
Finally, in the root directory of this project, you will find the `pom.xml`. This contains the project configuration details used by maven to build the project. If you want/need to use outside dependencies, you should add them to this file.
Running the Project
Here we will detail how to setup and run this project so you may get started, as well as the requirements needed to do so.
Requirements
Before beginning, make sure you have the following installed and ready to use
- Maven 3.5 or higher
- Java 8
Setup
To start, open a terminal window and navigate to wherever you unzipped to the root directory `imagefinder`. To build the project, run the command:
>`mvn package`
If all goes well you should see some lines that ends with "BUILD SUCCESS". When you build your project, maven should build it in the `target` directory. To clear this, you may run the command:
>`mvn clean`
To run the project, use the following command to start the server:
>`mvn clean test package jetty:run`
You should see a line at the bottom that says "Started Jetty Server". Now, if you enter `localhost:8080` into your browser, you should see the `index.html` welcome page! If all has gone well to this point, you're ready to begin!
## Submission
When you are finished working on the project, before zipping up and emailing back your submission, PLEASE RUN ONE LAST `mvn clean` COMMAND TO REMOVE ANY UNNECESSARY FILES FROM YOUR SUBMISSION**. Please also make sure to add the URLs you used to test your project to the `test-links.txt` file. After doing these things, you may zip up the root directory (`imagefinder`) and email it back to us.
## Final Notes
- If you feel you need more time to work, you are free to ask for it.
- If you are having any trouble, especially with the setup, please reach out and we will try to answer as soon as we can.
- The ideas listed above on how to expand the project are great starting points, but feel free to add in your own ideas as well.
- Try to follow some good-practice principles when working on your code, such as meaningful and clean variable/method names and other good coding practices.
- The code we have provided is to allow you to hit the ground running. You are free to use whatever web service you would like (as long as you use Java 8 and it is runnable from the command line).
- We look forward to seeing what you can do, so good luck and have fun

Question

03-06-2022
Computers and Technology

Answered

Westonci.ca is your trusted source for accurate answers to all your questions. Join our community and start learning today! Explore comprehensive solutions to your questions from a wide range of professionals on our user-friendly platform. Get quick and reliable solutions to your questions from a community of experienced experts on our platform.

The goal of this task is to perform a web crawl on a URL string provided by the user. From the crawl, you will need to parse out all of the images on that web page and return a JSON array of strings that represent the URLs of all images on the page. [Jsoup](https://jsoup.org/) is a great basic library for crawling and is already included as a maven dependency in this project, however you are welcome to use whatever library you would like.
Required Functionality
We expect your submission to be able to achieve the following goals:
- Build a web crawler that can find all images on the web page(s) that it crawls.
- Crawl sub-pages to find more images.
- Implement multi-threading so that the crawl can be performed on multiple pages at a time.
- Keep your crawl within the same domain as the input URL.
- Avoid re-crawling any pages that have already been visited.
Extra Functionality
No individual point below is explicitly required, but we recommend trying to achieve the following goals as well:
- Make your crawler "friendly" - try not to get banned from the site by performing too many crawls.
- Try to detect what images might be considered logos.
- Show off your front-end dev skills with Javascript, HTML, and/or CSS to make the site look more engaging.
- Any other way you feel you can show off your strengths as a developer ????
PLEASE do not send us a submission with only a basic JSoup crawl and only a couple lines of code. This is your chance to prove what you could contribute to our team.
Your project will be due exactly 48 hours after you receive this project. To submit, zip up your project (`imagefinder.zip`) and email it back to me. Please include a list of URLs that you used to test in your submissions. You should place them in the attached `test-links.txt` file found in the root of this project.
Structure
The ImageFinder servlet is found in `src/main/java/com/eulerity/hackathon/imagefinder/ImageFinder.java`. This is the only provided Java class. Feel free to add more classes or packages as you see fit.
The main landing page for this project can be found in `src/main/webapp/index.html`. This page contains more instructions and serves as the starting page for the web application. You may edit this page as much as it suits you, and/or add other pages.
Finally, in the root directory of this project, you will find the `pom.xml`. This contains the project configuration details used by maven to build the project. If you want/need to use outside dependencies, you should add them to this file.
Running the Project
Here we will detail how to setup and run this project so you may get started, as well as the requirements needed to do so.
Requirements
Before beginning, make sure you have the following installed and ready to use
- Maven 3.5 or higher
- Java 8
Setup
To start, open a terminal window and navigate to wherever you unzipped to the root directory `imagefinder`. To build the project, run the command:
>`mvn package`
If all goes well you should see some lines that ends with "BUILD SUCCESS". When you build your project, maven should build it in the `target` directory. To clear this, you may run the command:
>`mvn clean`
To run the project, use the following command to start the server:
>`mvn clean test package jetty:run`
You should see a line at the bottom that says "Started Jetty Server". Now, if you enter `localhost:8080` into your browser, you should see the `index.html` welcome page! If all has gone well to this point, you're ready to begin!
## Submission
When you are finished working on the project, before zipping up and emailing back your submission, PLEASE RUN ONE LAST `mvn clean` COMMAND TO REMOVE ANY UNNECESSARY FILES FROM YOUR SUBMISSION**. Please also make sure to add the URLs you used to test your project to the `test-links.txt` file. After doing these things, you may zip up the root directory (`imagefinder`) and email it back to us.
## Final Notes
- If you feel you need more time to work, you are free to ask for it.
- If you are having any trouble, especially with the setup, please reach out and we will try to answer as soon as we can.
- The ideas listed above on how to expand the project are great starting points, but feel free to add in your own ideas as well.
- Try to follow some good-practice principles when working on your code, such as meaningful and clean variable/method names and other good coding practices.
- The code we have provided is to allow you to hit the ground running. You are free to use whatever web service you would like (as long as you use Java 8 and it is runnable from the command line).
- We look forward to seeing what you can do, so good luck and have fun

Sagot :

Thank you for visiting our platform. We hope you found the answers you were looking for. Come back anytime you need more information. We appreciate your visit. Our platform is always here to offer accurate and reliable answers. Return anytime. Westonci.ca is your trusted source for answers. Visit us again to find more information on diverse topics.

a kite in the shape of a rhombus has diagonals that are 25 inches long and 15 inches long. What is the area of the kite?

groups of the periodic table correspond to elements with a the same color b the same atomic number c similar chemical properties d similar number of neutrons

An operon contains introns and exons. True False

Which of the following events occurred after Great Britain ended its occupation of India and partitioned off parts of the empire? *The partitioned areas became

Suppose the clay the teacher buys is available only in 10 pound packages.How would the graph be different from the one shown on the facing page?

which major river crosses the great wall

What two numbers multiply to -180 and adds to 8?

Jellyfish release the egg and sperm into the sea. The fertilized egg settles down on the sea floor and grows into a polyp. When polyps bud, they form a medusa,

Katie invested $1200 at 8.5% interest. She earned $612 interest at the end of y years. What is the value of y? A. 18 B. 12 C. 6 D. 3 I think it is c

a 21 inch long loaf of bread is cut into 3/4 inch slices. how many slices will there be?

saimaparvezVT saimaparvezVT · Answer 1 · 2022-06-14T09:01:52-04:00

The goal of this task is to perform a web crawl on a URL string provided by the user by the Add one or numerous URLs to be visited.

What is a multithreaded internet crawler?

The internet crawler will make use of a couple of threads. It may be capable of moving slowly all of the precise internet pages of a website. It may be capable of documenting again any 2XX and 4XX links. It will take withinside the area call from the command line. It will keep away from the cyclic traversal of links.

Here are the primary steps to construct a crawler:

Step 1: Add one or numerous URLs to be visited.

Step 2: Pop a hyperlink from the URLs to be visited and upload it to the Visited URLs thread.

Step 3: Fetch the page's content material and scrape the records you are interested by with the ScrapingBot API.

Sagot :

What is a multithreaded internet crawler?

Other Questions