One of the greatest advantages of the TextParser library is that it can be used with all platforms which implement .NET Standard 2.0. This enables you to use this library in almost any type of .NET application.
This specific walkthrough explains how you can use the TextParser library in a WinForms application to extract text from HTML documents (For example, Emails). Further, it also explains how the extraction results can be visualized using different controls such as FlexGrid, FlexPie, and others.
After completing the implementation of this walkthrough, you will learn the following:
- Extracting text using HTML extractor
- Populating the WinForms controls with the extraction results
Let us take an example to understand the implementation of the above mentioned points. Consider a scenario where we want to extract information (example, customer name, total order amount, ordered items, and so on from order confirmation emails) received from an e-commerce provider like Amazon as shown in the image below.
As it is commonly observed that the emails sent by a specific provider follow the same general structure in text presentation. So, one email can be used as a template to extract the data from all the other emails. The HTML extractor provided by the TextParser library would appropriately serve the purpose in this scenario as it is capable of extracting the desired text from HTML documents correctly, even if slight differences exist between the template emails and the source emails.
Step 1: Extracting text using HTML extractor
- Create a new Windows Forms application.
- Install the ‘C1.TextParser’ NuGet package in your application to add the appropriate references to the project.
- Copy the template email (‘amazonEmail1.html’) and the source email (‘amazonEmail2.html’) files from the ECommerceOrder product sample to your project folder.
- Load the template email by adding the following line of code to Form1.cs:
//同様の HTML ストリームからデータを抽出するためのテンプレートとして使用されるストリームを開きます。
Stream amazonTemplateStream = File.Open(@"..\\..\\amazonEmail1.html", FileMode.Open);
- Initialize the HtmlExtractor class using the loaded template email, using the code provided below:
//HTMLExtractor クラスを初期化して、テンプレートに基づいて HTML ソースからデータを抽出します。
HtmlExtractor amazonTemplate = new HtmlExtractor(amazonTemplateStream);
- Define fixed placeholders to extract the name of the customer, the expected delivery date and the total order amount by using the AddPlaceHolder method of the HtmlExtractor class. Note that fixed placeholders are shown marked with blue coloured boxes in the above image.
//顧客名のプレースホルダーを固定します。
String customerNameXPath =
@"/html/body/div[2]/div/div/div/table/tbody/tr[2]/td/p[1]";
amazonTemplate.AddPlaceHolder("CustomerName", customerNameXPath, 6, 15);
//配達予定日のプレースホルダーを固定します。
String deliveryDateXPath =
@"/html/body/div[2]/div/div/div/table/tbody/tr[3]/td/table/tbody/tr[1]/td[1]/p/strong";
amazonTemplate.AddPlaceHolder("DeliveryDate", deliveryDateXPath);
//注文の合計金額のプレースホルダーを固定します。
String totalAmountXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[8]/td[2]/strong";
amazonTemplate.AddPlaceHolder("TotalOrderAmount", totalAmountXPath);
- Define repeated place holders to extract the price, name and seller of each ordered item. Note that repeated placeholders are shown marked with red coloured boxes in the above image.
//項目ごとにブロックを順番に繰り返します。
String articleNameXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[1]/td[2]/p/a";
amazonTemplate.AddPlaceHolder("OrderedArticles", "ArticleName", articleNameXPath);
String articlePriceXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[1]/td[3]/strong";
amazonTemplate.AddPlaceHolder("OrderedArticles", "ArticlePrice", articlePriceXPath);
String articleSellerXPath = @"//*[@id=""shipmentDetails""]/table/tbody/tr[1]/td[2]/p/span";
amazonTemplate.AddPlaceHolder("OrderedArticles", "ArticleSeller", articleSellerXPath, 8, 18);
- Load the source email and invoke the Extract method of the HtmlExtractor class for extracting the desired text from the source email. Note that after extraction, the results are returned into a variable of type IExtractionResult.
//データを抽出するストリームを開きます。
Stream source = File.Open(@"..\\..\\amazonEmail2.html", FileMode.Open);
Step 2: Designing the dashboard
- Drag and drop the DashboardLayout control from the toolbox onto your Form and set its Dock property to Fill.
Observe: A layout of the type Split is attached to the DashboardLayout control and it contains two child containers (Splitter Panels) by default.
- Right click inside the DashboardLayout control. A context menu will open up. Click ‘Select c1DashboardLayout1.SplitContentPanel’ option.
Observe: The SplitContentPanel (layout control attached to the DashboardLayout by default) is selected.
- Click on the SplitContentPanel’s smart tag to open its Tasks Menu. Select ‘Add Panel’ from the DashboardSplitContainer Tasks menu to add a third child container to the dashboard.
- Select ‘c1SplitterPanel1’ and set its Dock property to Left.
- Drag and drop the RichTextBox control from the Toolbox onto ‘c1SplitterPanel1’. Set its following properties: Font size to ‘10.2’ and BackColor to ‘226, 218, 241’.
- Drag and drop the C1FlexGrid control from the Toolbox onto ‘c1SplitterPanel2’.
- Drag and drop the FlexPie control from the Toolbox onto ‘c1SplitterPanel3’. Set its Dock property to Fill.
Step 3: Populating Dashboard controls with the extraction results
Display extraction results as a JSON string in the RichTextBox control:
- Convert the extracted result to JSON format and assign it to the Text property of the RichTextBox control. Add the following code to Form1.cs to implement the described approach:
richTextBox1.Text = extractedResult.ToJsonString();
Display extraction results in a FlexGrid control:
- Configure the properties of the FlexGrid control by adding and calling the following method in Form1.cs:
private void ConfigureFlexGrid()
{
c1FlexGrid1.Rows.Count = 1;
c1FlexGrid1.Cols.Count = 2;
c1FlexGrid1.Cols.Fixed = 0;
c1FlexGrid1[0, 0] = "Placeholder";
c1FlexGrid1[0, 1] = "Value";
c1FlexGrid1.Cols[0].StarWidth = "*";
c1FlexGrid1.Cols[1].StarWidth = "2*";
c1FlexGrid1.Font = new System.Drawing.Font("Segoe UI", 8.45f);
c1FlexGrid1.Row = -1;
//スタイル設定
CellStyle cs = c1FlexGrid1.Styles.Normal;
cs.Border.Direction = BorderDirEnum.Vertical;
cs.TextAlign = TextAlignEnum.LeftCenter;
cs = c1FlexGrid1.Styles.Add("Data");
c1FlexGrid1.Styles.Alternate.BackColor = Color.FromArgb(232, 216, 232);
// アウトラインツリー
c1FlexGrid1.Tree.Column = 0;
c1FlexGrid1.Tree.Style = TreeStyleFlags.Simple;
c1FlexGrid1.Tree.LineStyle = System.Drawing.Drawing2D.DashStyle.Solid;
c1FlexGrid1.AllowMerging = AllowMergingEnum.Nodes;
// その他
c1FlexGrid1.AllowResizing = AllowResizingEnum.Columns;
c1FlexGrid1.SelectionMode = SelectionModeEnum.Cell;
c1FlexGrid1.HighLight = HighLightEnum.Always;
c1FlexGrid1.FocusRect = FocusRectEnum.Solid;
c1FlexGrid1.AllowSorting = AllowSortingEnum.None;
}
- Convert the extraction result to XML format so that it can be displayed as a hierarchy in the FlexGrid control, by adding the following line of code in Form1.cs:
//Json を XML に変換します。
XmlDocument doc =
JsonConvert.DeserializeXmlNode(extractedResult.ToJsonString(), "ExtractedResult");
- Read the XML document and populate the FlexGrid with XML data by defining the following method in Form1.cs:
private void GetXMLData(XmlNode node, int level)
{
// コメントノードをスキップします。
if (node.NodeType == XmlNodeType.Comment)
return;
// このノードに新しい行を追加します。
int row = c1FlexGrid1.Rows.Count;
c1FlexGrid1.Rows.Add();
if (node.Name.Equals("Property"))
c1FlexGrid1[row, 0] = node.Attributes["Name"].Value;
else
c1FlexGrid1[row, 0] = node.Name;
if (node.ChildNodes.Count == 1)
{
c1FlexGrid1[row, 1] = node.InnerText;
c1FlexGrid1.SetCellStyle(row, 1, c1FlexGrid1.Styles["Data"]);
}
// 新しい行をノードにします。
c1FlexGrid1.Rows[row].IsNode = true;
c1FlexGrid1.Rows[row].Node.Level = level;
// このノードに子がある場合は、それらも取得します。
if (node.ChildNodes.Count > 1)
{
// 再帰的に子を取得します。
foreach (XmlNode child in node.ChildNodes)
GetXMLData(child, level + 1);
}
}
- Call the ‘GetXMLData’ method configured above in Form1.cs to display the extraction results in a hierarchical format in FlexGrid control.
//FlexGrid に XML データを入力します。
GetXMLData(doc.ChildNodes[0].ChildNodes[1], 0);
Display extraction results in the FlexPie control:
To display the extracted results in a FlexPie control, we would need to create a datasource using these results which can be bound to the chart. To create the datasource we would define classes to map the extraction results to the class members and bind the chart to the class object.
- Create a class named ‘OrderedArticle’ which would represent each item in the list of ordered items and hence correspond to the repeated place holders. It is important to note that each class property in the ‘OrderedArticle’ class has a DataMemberAttribute whose ‘Name’ property corresponds to the names of the repeated placeholders.
public class OrderedArticle
{
[DataMember(Name = "ArticleName")]
public String Article_Name { get; set; }
[DataMember(Name = "ArticleSeller")]
public String Article_Seller { get; set; }
[DataMember(Name = "ArticlePrice")]
public String Article_Price { get; set; }
public Decimal ArticlePriceInDecimals
{
get
{
return decimal.Parse(Regex.Replace(Article_Price, @"[^\d.]", ""));
}
set
{
ArticlePriceInDecimals = value;
}
}
}
- Create a class named ‘AmazonTemplateRepeatedBlocks‘ class having a DataMemberAttribute whose ‘Name’ property corresponds to the name of the repeated block (‘OrderedArticles’) to which the repeated placeholders belong.
public class AmazonTemplateRepeatedBlocks
{
[DataMember(Name = "OrderedArticles")]
public List<OrderedArticle> Ordered_Items { get; set; }
}
}
- Retrieve the information about the ordered articles into the custom collection of class objects using the Get method of the IExtractionResult interface as shown:
List<OrderedArticle> articles=
extractedResult.Get<AmazonTemplateRepeatedBlocks>().Ordered_Items;
- Finally for populating the FlexPie with the information about the names and prices of the ordered items, add the following code to Form1.cs:
//抽出された結果を Flexpie に入力します。
flexPie1.DataSource = articles;
flexPie1.Binding = "ArticlePriceInDecimals";
flexPie1.BindingName = "Article_Name";
//その他の設定。
flexPie1.Legend.Position = C1.Chart.Position.Right;
flexPie1.Legend.ItemMaxWidth = 350;
flexPie1.Legend.TextWrapping = C1.Chart.TextWrapping.Wrap;
}
- Run the application. Observe that the controls are populated with the extracted results as shown in the image below: