Resources
  • Home
  • December-Chrono
  • Psalm136Personalized
  • ExcelResources
    • 99PublicFunctions
    • aDelimitedCodeTemplate
    • aCodeTemplate
    • aBetter_Code_Template
    • ApplyFormulaValue
    • ArrayPublic
    • ArrayMatchCopyRow
    • Capitalization
    • Colors
    • DataTypes
    • DeleteRows
    • Filter
    • FormatStandard
    • Forms
    • Grid
    • Headers
    • ImageScrape
    • InputBox
  • ExcelResources2
    • Like
    • List Review
    • MatchAndReturn
    • mod_Import
    • Numbers
    • PDF_Parse
    • RandomGenerator
    • iRe1
    • Rept
    • RelatedItemsLIst
    • RomanNumerals
    • SheetSplitter
    • Sort
    • Split Method
    • String_DataType
    • TabDelimited
    • Tab_InsertNew_or_ClearExisting
    • Unique_List
      • UniqueLIsts_FilterMethod
  • Excelresources_Files
    • FileLists
    • FileTextParse
    • Folders
  • excelresources_WebScrape
    • CreateList_1-Href_2-Jpg
    • TableData_Load_PlugPlay
    • CreateList_MultiPage_Image_Link
      • ImageDownload_WebLink_List
    • TableData_Load_ClassDiv
    • TableData_Load_ClassSpan
  • Houseboat Vacation
  • At the Foot of the King
  • Devotions
  • Mentoring
  • Inspirational Stories
    • Vacuum Lifting
  • Job Help
  • Resources
    • MS
    • Florida History
  • Contact
  • Heroscape
    • Fish
    • Playstation
  • You Tube Videos
  • House
    • Tiny Houses
    • Backyards
  • CampingResources
    • CampingLand
    • LED Throwies
  • Interesting
  • Recipes
  • Xbox
  • Skydiving
  • ReadingPlan-5Year
  • Desoto
  • BibleStudy
  • HurricaneIrma
  • Home
  • December-Chrono
  • Psalm136Personalized
  • ExcelResources
    • 99PublicFunctions
    • aDelimitedCodeTemplate
    • aCodeTemplate
    • aBetter_Code_Template
    • ApplyFormulaValue
    • ArrayPublic
    • ArrayMatchCopyRow
    • Capitalization
    • Colors
    • DataTypes
    • DeleteRows
    • Filter
    • FormatStandard
    • Forms
    • Grid
    • Headers
    • ImageScrape
    • InputBox
  • ExcelResources2
    • Like
    • List Review
    • MatchAndReturn
    • mod_Import
    • Numbers
    • PDF_Parse
    • RandomGenerator
    • iRe1
    • Rept
    • RelatedItemsLIst
    • RomanNumerals
    • SheetSplitter
    • Sort
    • Split Method
    • String_DataType
    • TabDelimited
    • Tab_InsertNew_or_ClearExisting
    • Unique_List
      • UniqueLIsts_FilterMethod
  • Excelresources_Files
    • FileLists
    • FileTextParse
    • Folders
  • excelresources_WebScrape
    • CreateList_1-Href_2-Jpg
    • TableData_Load_PlugPlay
    • CreateList_MultiPage_Image_Link
      • ImageDownload_WebLink_List
    • TableData_Load_ClassDiv
    • TableData_Load_ClassSpan
  • Houseboat Vacation
  • At the Foot of the King
  • Devotions
  • Mentoring
  • Inspirational Stories
    • Vacuum Lifting
  • Job Help
  • Resources
    • MS
    • Florida History
  • Contact
  • Heroscape
    • Fish
    • Playstation
  • You Tube Videos
  • House
    • Tiny Houses
    • Backyards
  • CampingResources
    • CampingLand
    • LED Throwies
  • Interesting
  • Recipes
  • Xbox
  • Skydiving
  • ReadingPlan-5Year
  • Desoto
  • BibleStudy
  • HurricaneIrma
Connect on Facebook

Create Link/Image List
Multi-Page

Some pages split results into multiple pages

3 Parts:
Sub: Opens Browser, loads each page and creates a list of all the links on that page
Function: Uses HTML tags to get a count of the estimated number of records
Function: Creates a brand new tab to dump the results on.
Makes the assumption that there is text within the HTML code indicating the count of records.
Picture
Sub CreateListofProducts_Multipage()
'http://www.kevinmccane.com/CreateList_MultiPage_Image_Link
'References: [Microsoft HTML Object Library][Microsoft Internet Controls]


Dim sNow As String
sNow = "Started reviewing " & Now

'--------------------------------------------------------------------------------------------------
'Website Specific Tags:

Const sQuote As String = """"

Const sProductOpen As String = "<div class=" & sQuote & "product swatch" & sQuote
Const sProductClose As String = "</div>"

Const sSkuOpen As String = "product_sku="
Const sSkuClose As String = ">"

Const sLinkOpen As String = "<a href="
Const sLinkClose As String = ">"

Const sImageOpen As String = "original="
Const sImageClose As String = ">"

Dim iBlockO As Long, iBlockC As Long
Dim iSKUo As Long, iSKUc As Long
Dim iLinkO As Long, iLinkC As Long
Dim iImageO As Long, iImageC As Long

Dim sBlockTemp As String
Dim sSKUtemp As String
Dim sLinkTemp As String
Dim sImageTemp As String

Const sBaseWebsite As String = "https://www.*****.com"
'--------------------------------------------------------------------------------------------------
'Static values:


Const sOpenDIV As String = "<div class=" & sQuote & "hidden results-count" & sQuote & ">"
Const sCloseDIV As String = "</div>"
Const sDivClassOpen As String = "<div class=" & sQuote & "product swatch"
Const sReplaceClassSku As String = sQuote & " product_sku=" & sQuote

'to refer to the running copy of Internet Explorer
Dim oWebBrowser As InternetExplorer
Dim sTempHTML As String
Dim sHTML As HTMLDocument
Dim iStartDIV As Long, iEndDIV As Long, sDIV As String
Dim sWebLink As String, iDivideBy As Long

'Working with the Strings:
Dim iCharA As Long, iCharB As Long, iCharC As Long, iCharD As Long
Dim sTempA As String, sTempB As String, iNextRow As String

'Standard Objects:
Dim Ws1 As Long, iLoop As Long, iPageLoop As Long
Dim iPage As Long
Dim iCounter As Long, iMaxPage As Long, iPageLimit As Long
Dim sTabName As String

'Array:
Dim aDump() As Variant

'DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
'Reset any values from previous execution / Set standard Objects:

Erase aDump
iCounter = 0
iNextRow = 1
sWebLink =sBaseWebsite &  "/search/side?page=" & iPage & "&sorting=pattern|asc"
iPage = 1
iCharA = 1
'DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

'#########################################################################

gPromptForNewTabName:
    sTabName = InputBox("Creating a new tab to put results on - Please Name the new tab", "Note: Do not use the name of an existing tab [Enter 0 if you want to Cancel the process]", "WebLinks")
    If sTabName = "0" Then GoTo gMemoryCleanupNoBrowserInitiated
    '+++++++++++++++++++++++++++++++++++++++++++++++++++++++
gCreateNewTab:
    InsertTabRename (sTabName)
    Ws1 = Worksheets(sTabName).Index
    Worksheets(Ws1).Select
    '+++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
gPromptForDivider:
    'iDivideBy = InputBox("Enter the # of products that show on page of of the link (will be used to determine how many pages to scrape)", "Total Records / Products per page = # of pages scraped", 36)
'#########################################################################


'MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

'Get the Max Count (based on string at the bottom of the page)
    'open Internet Explorer in memory, and go to website

        Set oWebBrowser = New InternetExplorer
        oWebBrowser.Visible = True
        
        oWebBrowser.Navigate sBaseWebsite &  "/search/side?page=" & 1 & "&sorting=pattern|asc"
    
    'Wait until oWebBrowser is done loading page
        Do While oWebBrowser.READYSTATE <> READYSTATE_COMPLETE
            Application.StatusBar = "Trying to go to website ..."
            DoEvents
        Loop

    'show text of HTML document returned
        Set sHTML = oWebBrowser.Document
        sTempHTML = sHTML.DocumentElement.innerHTML
        ':::::::::::::::::::::::::::::::::::::::::::::::::::
        '    'check the end of the document for the record count

            iStartDIV = InStr(1, sTempHTML, sOpenDIV, vbTextCompare)
            iEndDIV = InStr(iStartDIV + 1, sTempHTML, sCloseDIV, vbTextCompare)
            sDIV = Mid(sTempHTML, iStartDIV, iEndDIV - iStartDIV + Len(iEndDIV) + 2)
            iMaxPage = HTMLInsideTag(sDIV, sOpenDIV, sCloseDIV)
            iPageLimit = iMaxPage * 2
        ':::::::::::::::::::::::::::::::::::::::::::::::::::
        'Takes double the count (more results than site claimed it was - caused ubound(array) error

            aDump = Range(Worksheets(Ws1).Cells(1, 1).Address, Worksheets(Ws1).Cells(iPageLimit, 3).Address)
        ':::::::::::::::::::::::::::::::::::::::::::::::::::
'MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM


gLoopThroughHiddenPages:
For iPageLoop = 1 To iPageLimit
iPage = iPageLoop
'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
'open Internet Explorer in memory, and go to website

    sWebLink =sBaseWebsite &  "/search/side?page=" & iPage & "&sorting=pattern|asc"
    oWebBrowser.Navigate sWebLink

    'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    'NOTE: Only need to set a new browser window 1 time - moved OUTSIDE of the loop
    'xxx Set oWebBrowser = New InternetExplorer
    'xxx oWebBrowser.Visible = False
    'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

    
'Wait until oWebBrowser is done loading page
 
   Do While oWebBrowser.READYSTATE <> READYSTATE_COMPLETE
        Application.StatusBar = "Trying to go to website ..." & iPage
        DoEvents
    Loop

'show text of HTML document returned
  
  Set sHTML = oWebBrowser.Document
    sTempHTML = sHTML.DocumentElement.innerHTML
    
    'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
    'Create an exit point:

    If Len(sTempHTML) < 1500 Then GoTo gReviewFinished
    'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
    
gStartReviewingHTMLstring:
DoEvents

'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
'Reset to the first character position every time a new page # loads:

iCharA = 1
    'Provide visual code input (In addition to the application.toolbar)
    If Right(iPage, 1) = "0" Then Debug.Print iPage & " of " & iMaxPage & " " & iNextRow
'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Do Until iCharA = 0
    'parse out the data
    DoEvents

gResetTempStrings:
    sSKUtemp = ""
    sLinkTemp = ""
    sImageTemp = ""
    
'*********************************************************************************************
gExtractBlockOfProductData:
    'Get the first/last position
    iBlockO = InStr(iCharA, sHTML.DocumentElement.innerHTML, sProductOpen, vbTextCompare)            'Look for: "<div class="product swatch"
    iBlockC = InStr(iBlockO + 1, sHTML.DocumentElement.innerHTML, sProductClose, vbTextCompare)      'Look for: "</div>"
        'stop processing this page if no match on expected strings
        If iBlockO = 0 Or iBlockC = 0 Then GoTo gBlockResultsBad
    
gBlockResultsGood:
    'Make sure that we have a minimum of 1 character to work with
    If iBlockC <= iBlockO Then GoTo gNextThisPageLoop
    
        'Extract the string:
        sBlockTemp = Mid(sHTML.DocumentElement.innerHTML, iBlockO, iBlockC - iBlockO)
    
            'Move to the next step:
            GoTo gExtractSKUwithinBlock
    
gBlockResultsBad:
    'stop processing this page if no match on expected strings
    GoTo gNextThisPageLoop
'*********************************************************************************************

'*********************************************************************************************

gExtractSKUwithinBlock:
    iSKUo = InStr(1, sBlockTemp, sSkuOpen, vbTextCompare)                                           'Look for: "product_sku="
    iSKUc = InStr(iSKUo + 1, sBlockTemp, sSkuClose, vbTextCompare)                                  'Look for ">"
        'stop processing this page if no match on expected strings
        If iSKUo = 0 Or iSKUc = 0 Then GoTo gExtractLinkWithinBlock

gSkuResultsGood:
    'Make sure that we have a minimum of 1 character to work with
    If iSKUc <= iSKUo Then GoTo gExtractLinkWithinBlock
    
        'Extract the string
        'sSKUtemp = Mid(sBlockTemp, iSKUo, iSKUc - iSKUo)
        sSKUtemp = Replace(Mid(sBlockTemp, iSKUo, iSKUc - iSKUo), sSkuOpen, "", , , vbTextCompare)
        
            'Move to the next step:
            GoTo gExtractLinkWithinBlock
            
gSkuResultsBad:
    'stop processing this page if no match on expected strings
    GoTo gExtractLinkWithinBlock
'*********************************************************************************************

'*********************************************************************************************

gExtractLinkWithinBlock:
    iLinkO = InStr(1, sBlockTemp, sLinkOpen, vbTextCompare)                                           'Look for: "<a href="
    iLinkC = InStr(iLinkO + 1, sBlockTemp, sLinkClose, vbTextCompare)                                 'Look for ">"
        'stop processing this page if no match on expected strings
        If iLinkO = 0 Or iLinkC = 0 Then GoTo gExtractImageWithinBlock
        
gLinkResultsGood:
    'Make sure that we have a minimum of 1 character to work with
    If iLinkC <= iLinkO Then GoTo gExtractImageWithinBlock
    
        'Extract the string
        'sLinkTemp = Mid(sBlockTemp, iLinkO, iLinkC - iLinkO)
        sLinkTemp = Replace(Mid(sBlockTemp, iLinkO, iLinkC - iLinkO), sLinkOpen, "", , , vbTextCompare)
        
            'Move to the next step:
            GoTo gExtractImageWithinBlock
            
gLinkResultsBad:
    'stop processing this page if no match on expected strings
    GoTo gExtractImageWithinBlock
'*********************************************************************************************

'*********************************************************************************************

gExtractImageWithinBlock:
    iImageO = InStr(1, sBlockTemp, sImageOpen, vbTextCompare)                                           'Look for: "original="
    iImageC = InStr(iImageO + 1, sBlockTemp, sImageClose, vbTextCompare)                                'Look for ">"
        'stop processing this page if no match on expected strings
        If iImageO = 0 Or iImageC = 0 Then GoTo gReturnResultsLoadDumpCheckForNext
        
gImageResultsGood:
    'Make sure that we have a minimum of 1 character to work with
    If iImageC <= iImageO Then GoTo gReturnResultsLoadDumpCheckForNext
    
        'Extract the string
        'sImageTemp = Mid(sBlockTemp, iImageO, iImageC - iImageO)
        sImageTemp = Replace(Mid(sBlockTemp, iImageO, iImageC - iImageO), sImageOpen, "", , , vbTextCompare)
        
            'Move to the next step:
            GoTo gReturnResultsLoadDumpCheckForNext
gImageResultsBad:
    'stop processing this page if no match on expected strings
    GoTo gReturnResultsLoadDumpCheckForNext
'*********************************************************************************************


'*********************************************************************************************
'*********************************************************************************************
'*********************************************************************************************

gReturnResultsLoadDumpCheckForNext:
    'Replace any unecessary quotes within strings:
    sSKUtemp = Replace(sSKUtemp, sQuote, "", , , vbTextCompare)
    sLinkTemp = Replace(sLinkTemp, sQuote, "", , , vbTextCompare)
    sImageTemp = Replace(sImageTemp, sQuote, "", , , vbTextCompare)


        'set inext row (Plus increment the counter)
            iCounter = iCounter + 1
            If iNextRow > iMaxPage Then GoTo gReviewFinished
            iNextRow = iNextRow + 1
            
                'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
                'Alternate(Array Dump)

                aDump(iNextRow, 1) = sSKUtemp
                aDump(iNextRow, 2) = sLinkTemp
                aDump(iNextRow, 3) = sImageTemp
                'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
            
            
        'Increment the iCharA while on the same page
            Application.StatusBar = "Processed " & iNextRow & "..."
            iCharA = iBlockC
           
            GoTo gMoveToNextLoop
'*********************************************************************************************
'*********************************************************************************************
'*********************************************************************************************

        
gMoveToNextLoop:
    Loop


gNextThisPageLoop:
'!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Next iPageLoop


gMoveToNextPage:


gReviewFinished:
    Debug.Print sNow & " // Finished reviewing " & Now
'===================================================================================================
Worksheets(Ws1).Select
aDump(1, 1) = "SKU"
aDump(1, 2) = "Link"
aDump(1, 3) = "Image"
Set Destination = Range(Worksheets(Ws1).Cells(1, 1).Address)
Destination.Resize(UBound(aDump, 1), UBound(aDump, 2)).Value = aDump
'===================================================================================================

gMemoryCleanup:
'close down oWebBrowser and reset status bar
    oWebBrowser.Quit
    Set oWebBrowser = Nothing

gMemoryCleanupNoBrowserInitiated:
    Application.StatusBar = ""
    Erase aDump
End Sub



This is the function:

Function HTMLInsideTag(myInput As Variant, myOpenTag As Variant, myCloseTag As Variant) As String
Dim sFullText As String, sTagOpen As String, sTagClose As String

sFullText = myInput
sTagOpen = myOpenTag
sTagClose = myCloseTag


sFullText = Replace(sFullText, sTagOpen, "", , , vbTextCompare)
sFullText = Replace(sFullText, sTagClose, "", , , vbTextCompare)

HTMLInsideTag = sFullText
End Function
Powered by Create your own unique website with customizable templates.
  • Home
  • December-Chrono
  • Psalm136Personalized
  • ExcelResources
    • 99PublicFunctions
    • aDelimitedCodeTemplate
    • aCodeTemplate
    • aBetter_Code_Template
    • ApplyFormulaValue
    • ArrayPublic
    • ArrayMatchCopyRow
    • Capitalization
    • Colors
    • DataTypes
    • DeleteRows
    • Filter
    • FormatStandard
    • Forms
    • Grid
    • Headers
    • ImageScrape
    • InputBox
  • ExcelResources2
    • Like
    • List Review
    • MatchAndReturn
    • mod_Import
    • Numbers
    • PDF_Parse
    • RandomGenerator
    • iRe1
    • Rept
    • RelatedItemsLIst
    • RomanNumerals
    • SheetSplitter
    • Sort
    • Split Method
    • String_DataType
    • TabDelimited
    • Tab_InsertNew_or_ClearExisting
    • Unique_List
      • UniqueLIsts_FilterMethod
  • Excelresources_Files
    • FileLists
    • FileTextParse
    • Folders
  • excelresources_WebScrape
    • CreateList_1-Href_2-Jpg
    • TableData_Load_PlugPlay
    • CreateList_MultiPage_Image_Link
      • ImageDownload_WebLink_List
    • TableData_Load_ClassDiv
    • TableData_Load_ClassSpan
  • Houseboat Vacation
  • At the Foot of the King
  • Devotions
  • Mentoring
  • Inspirational Stories
    • Vacuum Lifting
  • Job Help
  • Resources
    • MS
    • Florida History
  • Contact
  • Heroscape
    • Fish
    • Playstation
  • You Tube Videos
  • House
    • Tiny Houses
    • Backyards
  • CampingResources
    • CampingLand
    • LED Throwies
  • Interesting
  • Recipes
  • Xbox
  • Skydiving
  • ReadingPlan-5Year
  • Desoto
  • BibleStudy
  • HurricaneIrma