Readdy Write  
0,00 €
Your View Money
Views: Count
Self 20% 0
Your Content 60% 0

Users by Links 0
u1*(Content+Views) 10% 0
Follow-Follower 0
s2*(Income) 5% 0

Count
Followers 0
Login Register as User

Read text from PDF using iTextSharp

10.10.2018 (👁20499)


 

Under C # you can create a complete PDF reader with just a few lines of code.

For this you can integrate the Nuget Package iTextSharp.

 

 

small Windows program with iTextSharp in C # and WPF

C #, wpf: PDF Textreader

With iTextSharp

 

In this example, the PDF document was read in from the right side and passed as text extracted to the C # WPF application

 

 

Main Window.

 

Xaml Code, MainWindow.xaml

<Window x:Class="PDF_TextReader.MainWindow"

        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"

        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"

        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"

        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"

        xmlns:local="clr-namespace:PDF_TextReader"

        mc:Ignorable="d"

        Title="MainWindow" Height="700" Width="800">

    <Grid >

        <Button x:Name="btnStart" Content="Read PDF" Click="btnStart_Click" HorizontalAlignment="Left" Margin="15,9,0,0" VerticalAlignment="Top" Width="86" Height="33"/>

        <TextBox x:Name="tbxFilename" Text="C:\_Daten\Desktop\VS_Projects\Office\PDF_TextReader\_Test_PDF\test_pdf_import.pdf"  

                 Width="631" Height="27" Margin="115,12,0,0" TextWrapping="Wrap"  VerticalAlignment="Top" HorizontalAlignment="Left" />

        <ScrollViewer Height="584"  Margin="16,71,25.6,0" VerticalAlignment="Top" >

            <TextBlock x:Name="lblPDF_Output" Text=""  

                   TextWrapping="Wrap" HorizontalAlignment="Stretch"  VerticalAlignment="Stretch"                     

                   />

        </ScrollViewer>

 

 

    </Grid>

</Window>

 

 

 

C # codebehind window

 

PdfReader (Filename) links the iTextSharp Reader to a PDF document.

PdfReader pdf_Reader = new PdfReader(sFilename);

 

With the C # code line PdfTextExtractor.GetTextFromPage the text from a Pdf page is read out completely as a string with break character.

Wildcards such as images, scans, and empty tables are omitted.

sText = PdfTextExtractor.GetTextFromPage(pdf_Reader, 1);

 

 

 

Code in C #, .net Framework 4.7

Under MainWindow.xaml.cs

 

using System;

using System.Windows;

 

//< using >

using iTextSharp.text.pdf;          //*iTextSharp

using iTextSharp.text.pdf.parser;   //*iTextSharp Text-Reader

//</ using >

 

namespace PDF_TextReader

{

    /// <summary>

    /// demo pdf reader

    /// </summary>

    public partial class MainWindow : Window

    {

        public MainWindow()

        {

            InitializeComponent();

        }

 

        private void btnStart_Click(object sender, RoutedEventArgs e)

        {

 

            //String sFilename = "C:\\_Daten\\Desktop\\VS_Projects\\Office\\PDF_TextReader\\_Test_PDF\\test_pdf_import_bank.pdf";

            String sFilename = tbxFilename.Text;

 

            //--< read File >--

            PdfReader pdf_Reader = new PdfReader(sFilename);

            String sText = "";

 

            for (int i = 1; i <= pdf_Reader.NumberOfPages; i++)

            {

                sText = sText + PdfTextExtractor.GetTextFromPage(pdf_Reader, i);

            }

 

            //MessageBox.Show(sText);

            lblPDF_Output.Text=sText;

        }

    }

}

 

 

Nuget Package: iTextSharp

In the wpf project you have to integrate the package iTextSharp via Nuget Package.

iTextSharp is free for personal use and freely available, as long as you do not create software that is offered for public sale.

 

 

Description of iTextSharp:

Nuget Package

iText is a PDF library that allows you to CREATE, ADAPT, INSPECT and MAINTAIN documents in the Portable Document Format (PDF), allowing you to add PDF functionality to your software projects with ease.  We even have documentation to help you get coding.

 

We have two currently supported versions: iText 5 and iText 7. Both are available under AGPL and Commercial license.

* iText 5 AGPL

* iText 7 community: https://www.nuget.org/packages/itext7/

iText 5 is a one solution library that is complex, but well documented to help you create your solutions.

iText 7 is a complete re-write of iText 5, allowing you to choose your adventure with add-ons, all based on a simple, modular code structure that is easy to use and well documented.

 

 

Both versions allow you to:

- Generate documents and reports based on data from an XML file or a database

- Create maps and books, exploiting numerous interactive features available in PDF

- Add bookmarks, page numbers, watermarks, and other features to existing PDF documents

- Split or concatenate pages from existing PDF files

- Fill out interactive forms

- Serve dynamically generated or manipulated PDF documents to a web browser

 

iText 7 includes pdfDebug, the first debugging tool that gives you a clear overview of your content streams and document structure as well as pdfCalligraph, allowing you to leverage advanced typography.

iText is available for Java, .NET in both versions, and Android and GAE for iText 5 only.

iTextSharp is the .NET port of iText 5.

 

Several iText engineers are actively supporting the project on StackOverflow: http://stackoverflow.com/questions/tagged/itext