In this blog, we’ll learn about how web scraping is done with Java. In this blog, we’re going to extract images from the URL provided by the user. A version of this project can be found at Image Extractor By Aashish Katwal. If you liked it, don’t forget to give it a ⭐ and any contribution is warmly accepted.?
What is Web Scraping?
Web scraping is the process of obtaining data from a website on either a large scale or a smaller scale. We can obtain specific data such as Images, Tables, etc., or the source code of an entire website. This data obtained can be used for various purposes such as data harvesting, research, etc. After extracting such data, it can be used to get insights as required.
How does Web Scraping work?
A web Scraper can obtain all the data on a website or the desired one. First, we need to provide the URL of the website we want to scrape. It is good to specify what type of data we want so that the process is quick and efficient.
For example: If we want images from a website(which we are going to learn to scrape in this blog), we specify that we need only those elements with img tags to fetch. This scrapes every img tag it finds on the website with provided URL.
A web scraper loads all the HTML code from the URL, though some advanced scrapers can even load CSS and JavaScript. Such extracted data can be stored in an excel or CSV file or even a JSON file which can later be used for research and analysis or various other purposes.
Web Scraping with Java
In Java, there’s a library called Jsoup, which is one of the most popular Java library for web scraping. I am doing a maven version where I will be using JSP. If you want to do it as a Java application or a normal Web project, you can download the jar file from their website and include it in your project.
-
index.jsp
<form method="POST" action="scrape">
<input type="url" name="webURL" required/>
<input type="submit" value="Scrape" />
</form>
</section>
<% if (request.getSession().getAttribute("url") != null && request.getSession().getAttribute("validUrls") != null) { %>
<main>
<h1 class="section-title">RESULT</h1>
<div class="show-result">
<%@include file="results.jsp" %>
</div>
</main>
<% }
request.getSession().removeAttribute("url");
%>
This is from where a user submits the URL which is to be scraped to extract the images. The
<%@include file="results.jsp" %>
includes the content of result.jsp file inside the div with class show-result.
-
In servlet
@Override
protected void doPost(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
String url = request.getParameter("webURL");
request.getSession().setAttribute("url", url);
request.getSession().setAttribute("validUrls", new WebScrapper().getAllImgs(url));
response.sendRedirect("/index.jsp");
}
This receives the URL entered by the user from the index.jsp page and sets that URL in the session attribute. Also, it calls the getAllImgs() function and sets that value to the session attribute.
-
In Java file(webScrapper.java)
Image Extraction method:
public List<String> getAllImgs(String sUrl) {
// keeps the valid image URLs.
List<String> validSourceUrls = new ArrayList<>();
try {
Document doc = Jsoup.connect(sUrl).get();
for (Element element : doc.select("img[src]")) {
String srcUrl = element.attr("src");
// if (srcUrl.isBlank()) {
// in JDK 11 or higher versions, we can use .isBlank() to check if the string is blank.
if (srcUrl.length() > 0) {
if (validateSrcUrl(srcUrl)) {
if (srcUrl.contains("?")) {
// removes the queries from the url
validSourceUrls.add(srcUrl.substring(0, srcUrl.indexOf('?')));
} else {
validSourceUrls.add(srcUrl);
}
}
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
return validSourceUrls;
}
In this method, tasks like connection establishment with the provided URL, adding valid image src URL to a list, and returning those URLs are performed. The Jsoup.connect(sUrl).get() statement establishes connection and .get() fetches and parses the HTML file. The doc.select() then select the elements to be scraped and stores them to element. Then the image source URL is checked for validation and adds the valid URLs to the list.
URL validation method:
public boolean validateSrcUrl(String url) {
boolean isValid = false;
List<String> validUrlItems = new ArrayList<>();
validUrlItems.add("jpg");
validUrlItems.add("png");
validUrlItems.add("jpeg");
validUrlItems.add("svg");
validUrlItems.add("gif");
// Splits the URL by .(dot) and checks if the last item of this array is one of the above extensions.
if (validUrlItems.contains(url.split("\\.")[url.split("\\.").length - 1]) || url.contains(".github")) {
isValid = true;
}
return isValid;
}
This method checks if the URL is a valid one that contains an image. For this, it checks if the link ends with any of the popular image extensions. the .github is there because Github images do not end with any extension.
-
result.jsp
<%@page import="java.util.List" %>
<%
String url = (String) request.getSession().getAttribute("url");
List<String> sources = (List<String>) request.getSession().getAttribute("validUrls");
for (String source : sources) {
source = source.contains("https://") ? source : source != null ? url + "/" + source : "";
String name = source.split("/")[source.split("/").length - 1];
%>
<div class="image-wrapper">
<div class="image">
<img src="<%= source%>" alt="alt"/>
</div>
<div class="details">
<a href="<%= source%>" title="See this Image" target="_blank">Visit</a>
<span class="imageName" title="<%= name.split("\\.")[0]%>">
<%= name.split("\\.")[0] %>
</span>
</div>
</div>
<%
}
sources.clear();
%>
This receives the list with valid URLs returned by getAllImgs() function and inserts them in the src. This also displays the image name.
Final Words
Web Scraping has very wide application areas. It doesn’t limit itself to extracting from one place and displaying it in another.
It is used in varieties of sectors such as investment, startups, marketing, etc.
After some styling on the above HTML code, this is what the final result looks like:
wowow great content Aashish ✨❤️? Thank you for sharing your knowledge.
Thank you Bibek Dai ❤
Pingback: canadianpharmaceuticalsonline.home.blog
Pingback: reallygoodemails.comonlineviagra
Pingback: https://viagraonline.estranky.sk/clanky/viagra-without-prescription.html
Pingback: viagraonlineee.wordpress.com
Pingback: https://viagraonline.home.blog/
Pingback: viagraonlinee.livejournal.com492.html
Pingback: https://onlineviagra.flazio.com/
Pingback: onlineviagra.fo.team
Pingback: https://www.kadenze.com/users/canadian-pharmaceuticals-for-usa-sales
Pingback: https://linktr.ee/canadianpharmaceuticalsonline
Pingback: https://disqus.com/home/forum/canadian-pharmaceuticals-online/
Pingback: best canadian mail order pharmacies
Pingback: dailygram.comindex.phpblog1155353we-know-quite-a-bit-about-covid-19
Pingback: https://challonge.com/en/canadianpharmaceuticalsonlinemt
Pingback: https://500px.com/p/listofcanadianpharmaceuticalsonline
Pingback: canadian viagra
Pingback: challonge.comencanadianpharmaciesshippingtousa
Pingback: challonge.comencanadianpharmaceuticalsonlinetousa
Pingback: pinshape.comusers2441403-canadian-pharmaceuticals-online
Pingback: www.scoop.ittopiccanadian-pharmaceuticals-online
Pingback: canadian pharmacies online prescriptions
Pingback: pinshape.com/users/2441621-canadian-pharmaceutical-companies
Pingback: Northwest Pharmacy
Pingback: drugs for sale
Pingback: where to buy stromectol uk
Pingback: reallygoodemails.comorderstromectoloverthecounter
Pingback: stromectol from costco
Pingback: stromectol australia
Pingback: https://www.seje.gov.mz/question/order-stromectol-over-the-counter-6/
Pingback: how much does stromectol cost
Pingback: canadajobscenter.comauthorcanadianpharmaceuticalsonline
Pingback: canadianpharmacy
Pingback: no 1 canadian pharcharmy online
Pingback: ktqt.ftu.edu.vnenquestion listcanadian-pharmaceuticals-for-usa-sales
Pingback: global pharmacy canada
Pingback: aoc.stamford.eduprofileStromectol
Pingback: how much does stromectol cost
Pingback: https://orderstromectoloverthecounter.bandcamp.com/releases
Pingback: stromectol cvs
Pingback: https://www.repairanswers.net/question/order-stromectol-over-the-counter-2/
Pingback: https://www.repairanswers.net/question/stromectol-order-online/
Pingback: stromectol composition
Pingback: facts stromectol
Pingback: web904.comstromectol-buy
Pingback: https://web904.com/buy-ivermectin-online-fitndance/
Pingback: stromectol in india
Pingback: canadian mail order pharmacies
Pingback: aoc.stamford.eduprofilehispennbackwin
Pingback: bursuppsligme.bandcamp.comreleases
Pingback: https://pinshape.com/users/2461310-canadian-pharmacies-shipping-to-usa
Pingback: pinshape.comusers2462760-order-stromectol-over-the-counter
Pingback: pinshape.comusers2462910-order-stromectol-online
Pingback: 500px.compphraspilliti
Pingback: https://web904.com/canadian-pharmaceuticals-for-usa-sales/
Pingback: https://500px.com/p/skulogovid/?view=groups
Pingback: 500px.compbersavahi?view=groups
Pingback: canadian pharmacy viagra
Pingback: www.provenexpert.comcanadian-pharmaceuticals-online-usa
Pingback: https://sanangelolive.com/members/pharmaceuticals
Pingback: canadian pharmacies that ship to us
Pingback: Canadian Pharmacy USA
Pingback: https://buyersguide.americanbar.org/profile/420642/0
Pingback: https://experiment.com/users/canadianpharmacy
Pingback: best canadian pharmacy
Pingback: challonge.comesapenti
Pingback: buy viagra us pharmacy
Pingback: https://challonge.com/citlitigolf
Pingback: https://order-stromectol-over-the-counter.estranky.cz/clanky/order-stromectol-over-the-counter.html
Pingback: https://soncheebarxu.estranky.cz/clanky/stromectol-for-head-lice.html
Pingback: https://lehyriwor.estranky.sk/clanky/stromectol-cream.html
Pingback: dsdgbvda.zombeek.cz
Pingback: inflavnena.zombeek.cz
Pingback: canada drugs online
Pingback: supplier.ihrsa.orgprofile4217170
Pingback: canadian pharmacies that ship to us
Pingback: legalmarketplace.alanet.orgprofile4219200
Pingback: https://moaamein.nacda.com/profile/422018/0
Pingback: pharmacy
Pingback: https://network.myscrs.org/profile/422020/0
Pingback: https://sanangelolive.com/members/canadianpharmaceuticalsonlineusa
Pingback: https://sanangelolive.com/members/girsagerea
Pingback: www.ecosia.orgsearch?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: canadian drugs
Pingback: buy viagra 25mg
Pingback: highest rated canadian pharmacies
Pingback: https://feeds.feedburner.com/bing/Canadian-pharmaceuticals-online
Pingback: canadian pharmacy online 24
Pingback: https://search.seznam.cz/?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: sanangelolive.commembersunsafiri
Pingback: https://duckduckgo.com/?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: online pharmacies
Pingback: https://www.dogpile.com/serp?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: online pharmacies canada
Pingback: https://search.givewater.com/serp?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: www.bakespace.commembersprofileСanadian pharmaceuticals for usa sales1541108
Pingback: canadian prescription drugstore
Pingback: results.excite.comserp?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: www.infospace.comserp?q="My Canadian Pharmacy - Extensive Assortment of Medications – 2022"
Pingback: https://headwayapp.co/canadianppharmacy-changelog
Pingback: Northwest Pharmacy
Pingback: buy viagra usa
Pingback: northwestpharmacy
Pingback: reallygoodemails.comorderstromectoloverthecounterusa
Pingback: stromectol new zealand
Pingback: pinshape.comusers2491694-buy-stromectol-fitndance
Pingback: https://www.provenexpert.com/medicament-stromectol/
Pingback: challonge.combunmiconglours
Pingback: https://theosipostmouths.estranky.cz/clanky/stromectol-biam.html
Pingback: dosage for stromectol
Pingback: www.midi.orgforumprofile89266-canadianpharmaceuticalsonline
Pingback: https://dramamhinca.zombeek.cz/
Pingback: https://sanangelolive.com/members/thisphophehand
Pingback: canadian pharmacies online
Pingback: list of reputable canadian pharmacies
Pingback: list of reputable canadian pharmacies
Pingback: stromectol buy online
Pingback: treating scabies with stromectol
Pingback: stromectoloverthecounter.wordpress.com
Pingback: https://buystromectol.livejournal.com/421.html
Pingback: orderstromectoloverthecounter.flazio.com
Pingback: discount canadian pharmacies
Pingback: https://conifer.rhizome.org/pharmaceuticals
Pingback: telegra.phOrder-Stromectol-over-the-counter-10-29
Pingback: stromectol usa
Pingback: orderstromectoloverthecounter.fo.team
Pingback: https://orderstromectoloverthecounter.proweb.cz/
Pingback: https://orderstromectoloverthecounter.nethouse.ru/
Pingback: https://sandbox.zenodo.org/communities/canadianpharmaceuticalsonline/
Pingback: canada drugs online
Pingback: canada online pharmacy
Pingback: https://taylorhicks.ning.com/photo/albums/best-canadian-pharmaceuticals-online
Pingback: my.afcpe.orgforumsdiscussiondiscussionsreputable-canadian-pharmaceuticals-online
Pingback: https://www.dibiz.com/ndeapq
Pingback: https://www.podcasts.com/canadian-pharmacies-shipping-to-usa
Pingback: canadianpharmaceuticals.educatorpages.compagescanadian-pharmacies-shipping-to-usa
Pingback: https://soundcloud.com/canadian-pharmacy
Pingback: peatix.comuser14373921view
Pingback: https://www.cakeresume.com/me/best-canadian-pharmaceuticals-online
Pingback: https://dragonballwiki.net/forum/canadian-pharmaceuticals-online-safe/
Pingback: trust pharmacy canada
Pingback: jemi.socanadian-pharmacies-shipping-to-usa
Pingback: https://www.homify.com/ideabooks/9099923/reputable-canadian-pharmaceuticals-online
Pingback: canadian pharmaceuticals online
Pingback: https://infogram.com/canadian-pharmacies-shipping-to-usa-1h1749v1jry1q6z
Pingback: pharmacy canada
Pingback: canada pharmacies
Pingback: https://500px.com/p/maybenseiprep/?view=groups
Pingback: cialis canadian pharmacy
Pingback: https://sacajegi.estranky.cz/clanky/online-medicine-shopping.html
Pingback: https://speedopoflet.estranky.sk/clanky/international-pharmacy.html
Pingback: canada pharmacies
Pingback: https://sanangelolive.com/members/maiworkgendty
Pingback: issuu.comlustgavalar
Pingback: https://calendly.com/canadianpharmaceuticalsonline/onlinepharmacy
Pingback: https://aoc.stamford.edu/profile/uxertodo/
Pingback: https://www.wattpad.com/user/Canadianpharmacy
Pingback: canadian pharmaceuticals online
Pingback: canada viagra
Pingback: www.provenexpert.comonline-order-medicine
Pingback: https://challonge.com/ebocivid
Pingback: https://obsusilli.zombeek.cz/
Pingback: canadian pharmacy no prescription
Pingback: canadian pharmaceuticals
Pingback: cialis from canada
Pingback: https://canadianpharmaceuticalsonline.tawk.help/article/canadian-pharmacies-shipping-to-usa
Pingback: https://sway.office.com/bwqoJDkPTZku0kFA
Pingback: canadianpharmaceuticalsonline.eventsmart.com20221120canadian-pharmaceuticals-for-usa-sales
Pingback: https://suppdentcanchurch.estranky.cz/clanky/online-medicine-order-discount.html
Pingback: aoc.stamford.eduprofiletosenbenlren
Pingback: https://pinshape.com/users/2513487-online-medicine-shopping
Pingback: 500px.compmeyvancohurt?view=groups
Pingback: online pharmacy
Pingback: canada drug
Pingback: appieloku.estranky.czclankyonline-medicine-to-buy.html
Pingback: canadian prescriptions online
Pingback: canada pharmacy
Pingback: https://canadianpharmaceuticalsonline.publog.jp/archives/16846649.html
Pingback: online canadian pharmacies
Pingback: https://canadianpharmaceuticalsonline.diary.to/archives/16857199.html
Pingback: canadianpharmaceuticalsonline.weblog.toarchives19410199.html
Pingback: buy viagra usa
Pingback: https://canadianpharmaceuticalsonline.blogism.jp/archives/17866152.html
Pingback: canadianpharmaceuticalsonline.blogo.jparchives19436771.html
Pingback: canadianpharmaceuticalsonline.blogto.jparchives19498043.html
Pingback: canadianpharmaceuticalsonline.gger.jparchives18015248.html
Pingback: canada viagra
Pingback: https://canadianpharmaceuticalsonline.liblo.jp/archives/19549081.html
Pingback: canadian prescriptions online
Pingback: best canadian pharmacy
Pingback: pinshape.comusers2528098-canadian-pharmacy-online
Pingback: canadian pharmacycanadian pharmacy
Pingback: www.buymeacoffee.compharmaceuticals
Pingback: https://telegra.ph/Canadian-pharmacy-drugs-online-12-11
Pingback: graph.orgCanadian-pharmacies-online-12-11
Pingback: https://canadianonlinepharmacieslegitimate.flazio.com/
Pingback: northwest pharmacy canada
Pingback: app.roll20.netusers11413335canadian-pharmaceuticals-online-shipping
Pingback: linktr.eecanadianpharmaceuticalsonlineu
Pingback: onlinepharmaciesofcanada.bigcartel.combest-canadian-online-pharmacy
Pingback: hub.docker.comrcanadadiscountdrugpharmaceuticals
Pingback: pharmacy-online.teachable.com
Pingback: Canadian Pharmacies Shipping to USA
Pingback: https://disqus.com/by/canadiandrugspharmacy/about/
Pingback: canadian prescriptions online
Pingback: https://bitcoinblack.net/community/canadianpharmacyonlineviagra/info/
Pingback: https://forum.melanoma.org/user/canadadrugsonline/profile/
Pingback: wakelet.com@OnlinepharmacyCanadausa
Pingback: https://www.divephotoguide.com/user/canadadrugspharmacyonline
Pingback: online pharmacy canada
Pingback: http://canadianpharmaceuticalsonlinee.iwopop.com/
Pingback: online pharmacies canada
Pingback: pharmacycheapnoprescription.nethouse.ru
Pingback: https://www.midi.org/forum/profile/96944-pharmacyonlinecheap
Pingback: https://www.provenexpert.com/canadian-pharmacy-viagra-generic2/
Pingback: dailygram.comblog1183360canada-online-pharmacies
Pingback: prescriptions from canada without
Pingback: www.mixcloud.comcanadianpharmaceuticalsonline
Pingback: https://sketchfab.com/canadianpharmaceuticalsonline
Pingback: fliphtml5.comhomepagefhrha
Pingback: www.goodreads.comusershow161146330-canadianpharmaceuticalsonline
Pingback: myanimelist.netprofilecanadapharmacies
Pingback: pharmacyonlineprescription.webflow.io
Pingback: https://www.isixsigma.com/members/pharmacyonlinenoprescription/
Pingback: https://slides.com/bestcanadianonlinepharmacies
Pingback: www.mojomarketplace.comuserdiscountcanadiandrugs-f0IpYCKav8
Pingback: canadianpharmaceuticalsonlinee.bandcamp.comtrackcanadian-pharmaceuticals-usa
Pingback: www.askclassifieds.comlistingaarp-recommended-canadian-pharmacies
Pingback: safe canadian online pharmacies
Pingback: haikudeck.compresentationscanadianpharmacies
Pingback: www.bakespace.commembersprofileViagra generic online Pharmacy1562809
Pingback: conifer.rhizome.orgDiscountpharmacy
Pingback: most reliable canadian pharmacies
Pingback: canadian medications
Pingback: slides.comcanadianpharmacycialis20mg
Pingback: canadian pharmacy online 24
Pingback: seedandspark.comuserbuy-viagra-pharmacy-100mg
Pingback: www.giantbomb.comprofilereatticamicblogcanadian-government-approved-pharmacies268967
Pingback: www.bakespace.commembersprofileCanadian drugs online pharmacies1563583
Pingback: www.midi.orgforumprofile100747-canadian-drugs-pharmacies-online
Pingback: online canadian pharmacies
Pingback: online pharmacy canada
Pingback: canadian pharmacys
Pingback: https://taylorhicks.ning.com/photo/albums/pharmacies-shipping-to-usa
Pingback: https://my.afcpe.org/forums/discussion/discussions/canadian-pharmacy-drugs-online
Pingback: www.brit.coucanadian-pharmacydrugs-online
Pingback: www.dibiz.comgdooc
Pingback: https://www.podcasts.com/canadian-pharmacy-online
Pingback: pharmacy canada
Pingback: www.passivehousecanada.commemberscanada-pharmaceuticals-online-generic
Pingback: jemi.sogeneric-viagra-online-pharmacy
Pingback: www.homify.comideabooks9295471canadian-pharmacy-drugs-online
Pingback: most reliable canadian pharmacies
Pingback: infogram.comcanadian-pharmaceuticals-online-safe-1h7g6k0gqxz7o2o?live
Pingback: https://forum.melanoma.org/user/canadianpharmacyonline/profile/
Pingback: www.buymeacoffee.compharmacies
Pingback: https://www.brit.co/u/canadian-online-pharmaciesprescription-drugs
Pingback: www.passivehousecanada.commembersonline-drugs-without-prescriptions-canada
Pingback: www.cakeresume.commeonline-drugs-without-prescriptions-canada
Pingback: rabbitroom.commemberscanadianpharmaceuticalsonlinewithnoprescriptionprofile
Pingback: http://www.celtras.uniport.edu.ng/profile/canadianpharmacy/
Pingback: https://amarutalent.edu.pe/forums/users/viagra-generic-canadian-pharmacy/
Pingback: canadian drugstore
Pingback: canadian pharmacy uk delivery
Pingback: online canadian pharmacies
Pingback: https://www.beastsofwar.com/forums/users/canadiancialis/
Pingback: https://www.windsurf.co.uk/forums/users/canadian-pharmacy-viagra-generic
Pingback: www.mjyoung.netweblogforumsuserscanada-online-pharmacies
Pingback: solorider.comforumsuserscanadian-pharmaceuticals
Pingback: www.viki.comuserscanadianpharmaciessabout
Pingback: https://www.mixcloud.com/canadapharmacies/
Pingback: http://climbingcoaches.co.uk/member-home/canadianpharmacies/profile/
Pingback: northwest pharmacies
Pingback: online pharmacies canada
Pingback: best canadian mail order pharmacies
Pingback: https://challonge.com/gyoupafefer
Pingback: https://keytygemi.estranky.cz/clanky/canadian-online-pharmacies.html
Pingback: canada rx
Pingback: hafbeltminla.zombeek.cz
Pingback: canadian pharmaceuticals online
Pingback: online canadian pharmacy
Pingback: tadalafil online paypal