Resources
  • Home
  • December-Chrono
  • Psalm136Personalized
  • ExcelResources
    • 99PublicFunctions
    • aDelimitedCodeTemplate
    • aCodeTemplate
    • aBetter_Code_Template
    • ApplyFormulaValue
    • ArrayPublic
    • ArrayMatchCopyRow
    • Capitalization
    • Colors
    • DataTypes
    • DeleteRows
    • Filter
    • FormatStandard
    • Forms
    • Grid
    • Headers
    • ImageScrape
    • InputBox
  • ExcelResources2
    • Like
    • List Review
    • MatchAndReturn
    • mod_Import
    • Numbers
    • PDF_Parse
    • RandomGenerator
    • iRe1
    • Rept
    • RelatedItemsLIst
    • RomanNumerals
    • SheetSplitter
    • Sort
    • Split Method
    • String_DataType
    • TabDelimited
    • Tab_InsertNew_or_ClearExisting
    • Unique_List
      • UniqueLIsts_FilterMethod
  • Excelresources_Files
    • FileLists
    • FileTextParse
    • Folders
  • excelresources_WebScrape
    • CreateList_1-Href_2-Jpg
    • TableData_Load_PlugPlay
    • CreateList_MultiPage_Image_Link
      • ImageDownload_WebLink_List
    • TableData_Load_ClassDiv
    • TableData_Load_ClassSpan
  • Houseboat Vacation
  • At the Foot of the King
  • Devotions
  • Mentoring
  • Inspirational Stories
    • Vacuum Lifting
  • Job Help
  • Resources
    • MS
    • Florida History
  • Contact
  • Heroscape
    • Fish
    • Playstation
  • You Tube Videos
  • House
    • Tiny Houses
    • Backyards
  • CampingResources
    • CampingLand
    • LED Throwies
  • Interesting
  • Recipes
  • Xbox
  • Skydiving
  • ReadingPlan-5Year
  • Desoto
  • BibleStudy
  • HurricaneIrma
  • Home
  • December-Chrono
  • Psalm136Personalized
  • ExcelResources
    • 99PublicFunctions
    • aDelimitedCodeTemplate
    • aCodeTemplate
    • aBetter_Code_Template
    • ApplyFormulaValue
    • ArrayPublic
    • ArrayMatchCopyRow
    • Capitalization
    • Colors
    • DataTypes
    • DeleteRows
    • Filter
    • FormatStandard
    • Forms
    • Grid
    • Headers
    • ImageScrape
    • InputBox
  • ExcelResources2
    • Like
    • List Review
    • MatchAndReturn
    • mod_Import
    • Numbers
    • PDF_Parse
    • RandomGenerator
    • iRe1
    • Rept
    • RelatedItemsLIst
    • RomanNumerals
    • SheetSplitter
    • Sort
    • Split Method
    • String_DataType
    • TabDelimited
    • Tab_InsertNew_or_ClearExisting
    • Unique_List
      • UniqueLIsts_FilterMethod
  • Excelresources_Files
    • FileLists
    • FileTextParse
    • Folders
  • excelresources_WebScrape
    • CreateList_1-Href_2-Jpg
    • TableData_Load_PlugPlay
    • CreateList_MultiPage_Image_Link
      • ImageDownload_WebLink_List
    • TableData_Load_ClassDiv
    • TableData_Load_ClassSpan
  • Houseboat Vacation
  • At the Foot of the King
  • Devotions
  • Mentoring
  • Inspirational Stories
    • Vacuum Lifting
  • Job Help
  • Resources
    • MS
    • Florida History
  • Contact
  • Heroscape
    • Fish
    • Playstation
  • You Tube Videos
  • House
    • Tiny Houses
    • Backyards
  • CampingResources
    • CampingLand
    • LED Throwies
  • Interesting
  • Recipes
  • Xbox
  • Skydiving
  • ReadingPlan-5Year
  • Desoto
  • BibleStudy
  • HurricaneIrma
Connect on Facebook

Create Lists:
Href
JPG

Web Scrape:

By following the prompts, 1 of 2 types of lists can be generated for a web page
1) http://www.***.com
2) www.***.com/folder/subfolder/ImageName.JPG

Use "href=" to get a list of links:

Picture

The result will look something like this:

Picture


Use "src=" to get a list of Image Links:

Picture

The result will look something like this:

Picture


If your HTML link contains the string [pagesize=] followed by [&]
     and...
Allows you the option of setting the number of results per page

The code will look for "pageSize" and the following [&].
It will prompt for the numeric value to use
To see all 58 for the example on the right, enter 58 at the prompt
Note: Website may allow for input but still not allow you to see ALL records.  Code will only get Page 1 in these cases.
Picture
Picture



Here is the code:

Sub WebScrape_CreateList_Hyperlinks_ALL_HREF()
'--------------------------------------------------------------------------------
'Method Extracts each string: Start(Input:"href") End:("[2nd Quotation Mark])
'Creates a blank sheet - Not Active Sheet
'No matching to any existing data - based ONLY on Input:[Web Page Link]
'NOTE: !!! Will only handle the [[[ 1st ]]] page of data (Unless web link allows for ALL results to be loaded into a single page)
'--------------------------------------------------------------------------------
'Note: Calls FUNCTION: [InsertTabRename] (available on PublicFunctions Module)
'--------------------------------------------------------------------------------
'Note: Requires References:
    '[Microsoft Internet Controls]
    '[Microsoft HTML Object Library]
'--------------------------------------------------------------------------------
'Note: Requires Public code within workbook:
    'Enum READYSTATE
    'READYSTATE_UNINITIALIZED = 0
    'READYSTATE_LOADING = 1
    'READYSTATE_LOADED = 2
    'READYSTATE_INTERACTIVE = 3
    'READYSTATE_COMPLETE = 4
    'End Enum

'--------------------------------------------------------------------------------

Dim sWebLink As String, sQuote As String, sTempA As String, sTempB As String, sNBSP As String
Dim iLoop As Long, Ws1 As Long, iNextRow As Long, iTempA As Long, iTempB As Long, iTempC As Long, iUboundR As Long
Dim sHref As String
Dim aDump() As Variant
Dim iMaxRecords As Long, iEndMaxChar As Long, sUserInputMaxRecords  As String, sPageSize As String, sEndMaxChar As String
Dim bFound As Boolean, iLoopDump As Long


'VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
gCreateNamesOfAllExistingTabs:
    For iLoop = 1 To ActiveWorkbook.Worksheets.Count
        sTempA = sTempA & "^" & Worksheets(iLoop).Name & "^"
    Next iLoop
    
gGetInputForNewTabToBeAdded:
DoEvents
    sTempB = InputBox("Creating a new tab to put results on - Please Name the new tab", "Note: Do not use the name of an existing tab [Enter 0 if you want to Cancel the process]", "WebLinks")
    If sTempB = "0" Then GoTo gMemoryCleanupNoBrowserInitiated
    
    If InStr(1, sTempA, "^" & sTempB & "^", vbTextCompare) > 0 Then GoTo gAttemptedTabNameIsAlreadyInUse
    GoTo gCreateNewTab
    
gAttemptedTabNameIsAlreadyInUse:
DoEvents
    sTempB = InputBox("The name you tried to use is already taken - try again.", "Note: DO NOT use the name of an existing tab", "WebLinks")
    If InStr(1, sTempA, "^" & sTempB & "^", vbTextCompare) > 0 Then GoTo gAttemptedTabNameIsAlreadyInUse
    GoTo gCreateNewTab
    
    '+++++++++++++++++++++++++++++++++++++++++++++++++++++++
gCreateNewTab:
    InsertTabRename (sTempB)
    Ws1 = Worksheets(sTempB).Index
    '+++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
    'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
gNewTabHasBeenCreated:
    'Debug.Print "Tab created - ok to proceed"
'VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV




gPrepObjects:
'OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
'Erase an pre-existing Arrays
    Erase aDump

'Load Index/Strings required for process to work
    iTempA = 1
    iNextRow = 1
    iUboundR = 1

'Tools(Instr)
    sQuote = """"
    sNBSP = " "
    sPageSize = "pageSize="
    sEndMaxChar = "&"
    
'Recycle Objects:
    sTempA = ""
    sTempB = ""
'OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

gInputVariables:
'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
'Get Input (Or objects that could be loaded through function)
    sWebLink = InputBox("Enter the web page to review and extract links from", "Create List of All Links From Webpage:", "https://www.***.com")
    
    
gCheckForWebStringResultsPerPageInput:
    'Check for Input option in the web string:
    If InStr(1, sWebLink, sPageSize, vbTextCompare) = 0 Then GoTo gPromptForLinkStartMarker
    
    '(Items Per Page) ####################################################### 5/6/18 - start new code (removes separation of multiple pages - if possible)
gWebStringAllowsRecordCountInput:
    'Get the Character Positions of the [1st] and [2nd] marker
    iMaxRecords = InStr(1, sWebLink, sPageSize, vbTextCompare)
    iEndMaxChar = InStr(iMaxRecords + 1, sWebLink, sEndMaxChar, vbTextCompare)
        
        'Skip the process if the Closing character was not found
        If iEndMaxChar <= iMaxRecords Then GoTo gPromptForLinkStartMarker
            sUserInputMaxRecords = InputBox("Enter the Max # of records available using this search link.", "If Search Results Count = '58', use 58.", "58")
            sWebLink = Left(sWebLink, iMaxRecords - 1) & sPageSize & sUserInputMaxRecords & sEndMaxChar & Right(sWebLink, Len(sWebLink) - iEndMaxChar - Len(sEndMaxChar) + 1)
    '(Items Per Page) ####################################################### 5/6/18 - start new code (removes separation of multiple pages - if possible)

gPromptForLinkStartMarker:
'Get "Start" Marker string:
    sHref = InputBox("Enter the string that is used to mark the beginning of each link", "Typical options:[href=(Web Links][src=(Image Links)]", "href=")
'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII


'===================================================================================================
gCreateWebBrowserObject:
'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
'Create a web browser (hidden)
Set oWebBrowser = New InternetExplorer
oWebBrowser.Visible = False
'BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
'===================================================================================================
'===================================================================================================
'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

'Navigate to the page for data extraction
    oWebBrowser.navigate sWebLink
'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
'Wait until IE is done loading page

    Do While oWebBrowser.READYSTATE <> READYSTATE_COMPLETE
        Application.StatusBar = "Trying to go to website... "
        DoEvents
    Loop
'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
gCreateStringObjectUsingHTMLcode:
'show text of HTML document returned
    Set sHTML = oWebBrowser.document
    sTempA = sHTML.DocumentElement.innerHTML
'WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
'===================================================================================================


gDefineArrayHTML:
'===================================================================================================
Worksheets(Ws1).Select

'Get a count of all instances of our string marker(href=) to use for the Ubound(array) limit
    iUboundR = Len(sTempA) - Len(Replace(sTempA, sHref, "", , , vbTextCompare))
    iUboundR = iUboundR / Len(sHref)
    aDump = Range(Worksheets(Ws1).Cells(1, 1), Worksheets(Ws1).Cells(iUboundR, 1).Address)
'===================================================================================================

gPreCleanHTMLstring:
DoEvents
'(Cleanup)CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
    'remove the carriage returns:

    sTempA = Replace(sTempA, Chr(13), "", , , vbTextCompare)
    sTempA = Replace(sTempA, Chr(10), "", , , vbTextCompare)
'(Cleanup)CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC

'LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
'LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL

gWebReviewHTMLstring:
'LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
    
gParseText:
'PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP
'get the beginning (href=) and end (quote) positions

    iTempB = InStr(iTempA, sTempA, sHref, vbTextCompare)
    iTempC = InStr(iTempB + Len(sHref) + 1, sTempA, sQuote, vbTextCompare)
    
    'VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
    If iTempC = 0 Or iTempB = 0 Or iTempC < iTempB Then GoTo gStringErrorMID
    'VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
    
    'Convert results to a string

    sTempB = Mid(sTempA, iTempB, iTempC - iTempB)
        GoTo gImageExtractLink
'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
gStringErrorMID:
    Debug.Print "Error occured when setting Alpha/Omega MID positions after Link #: " & iNextRow
    GoTo gImageIncrementStartSearchPosition
'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
'PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP




gImageExtractLink:
DoEvents
'+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    'Get rid of the extra characters not needed (src=) & (quote)

    sTempB = Replace(sTempB, sHref, "", , , vbTextCompare)
    sTempB = Replace(sTempB, sQuote, "", , , vbTextCompare)
        
       
            '(Prevent Dupes) DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD Start: 5/6/18 - added code to prevent dupes (loop through loaded links to check for dupe)
            bFound = False
            For iLoopDump = iNextRow To LBound(aDump) Step -1
                If UCase(sTempB) = UCase(aDump(iLoopDump, 1)) Then bFound = True
                'skip the step that adds the new link to the list because it is a dupe
                If bFound = True Then
                    GoTo gImageIncrementStartSearchPosition
                End If
            Next iLoopDump
            '(Prevent Dupes) DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD End: 5/6/18 - added code to prevent dupes (loop through loaded links to check for dupe)
        
gNewLinkToAdd:
    'Increment the counter to set the next Array Row
    iNextRow = iNextRow + 1
    If Right(iNextRow, 2) = "00" Then Application.StatusBar = "Loading Link #:(" & iNextRow & ") of " & iUboundR
    
    'Load the Array with the extracted value:
        aDump(iNextRow, 1) = sTempB
        GoTo gImageIncrementStartSearchPosition
'+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

'(MoveNext):::::::::::::::::::::::::::::::::::::::::::::::::::::
gImageIncrementStartSearchPosition:
    iTempA = InStr(iTempC + 1, sTempA, sHref, vbTextCompare)
        If iTempA = 0 Then GoTo gFinishedWithReview
        GoTo gWebReviewHTMLstring
'(MoveNext):::::::::::::::::::::::::::::::::::::::::::::::::::::

'LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL
'LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL



gFinishedWithReview:
'(Dump) DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
'===================================================================================================

Worksheets(Ws1).Select
Set Destination = Range(Worksheets(Ws1).Cells(1, 1).Address)
Destination.Resize(UBound(aDump, 1), UBound(aDump, 2)).Value = aDump
'===================================================================================================
'(Dump) DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD

    GoTo gMemoryCleanup
'(Dump) DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD



'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

gErrorHandlingCaleb:
    Debug.Print "Last checked: " & myLoadRow
    Debug.Print "Error occured after Link #: " & iNextRow
    ActiveWorkbook.Save
'EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

'MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
gMemoryCleanup:
'MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
'(Close down oWebBrowser) / (Reset status bar)
    oWebBrowser.Quit
    Set oWebBrowser = Nothing

gMemoryCleanupNoBrowserInitiated:
    Application.StatusBar = ""

'Release Array memory
    Erase aDump
'MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

gHappyAdios:
MsgBox "Finished Creating list of Web Links"
End Sub
Powered by Create your own unique website with customizable templates.
  • Home
  • December-Chrono
  • Psalm136Personalized
  • ExcelResources
    • 99PublicFunctions
    • aDelimitedCodeTemplate
    • aCodeTemplate
    • aBetter_Code_Template
    • ApplyFormulaValue
    • ArrayPublic
    • ArrayMatchCopyRow
    • Capitalization
    • Colors
    • DataTypes
    • DeleteRows
    • Filter
    • FormatStandard
    • Forms
    • Grid
    • Headers
    • ImageScrape
    • InputBox
  • ExcelResources2
    • Like
    • List Review
    • MatchAndReturn
    • mod_Import
    • Numbers
    • PDF_Parse
    • RandomGenerator
    • iRe1
    • Rept
    • RelatedItemsLIst
    • RomanNumerals
    • SheetSplitter
    • Sort
    • Split Method
    • String_DataType
    • TabDelimited
    • Tab_InsertNew_or_ClearExisting
    • Unique_List
      • UniqueLIsts_FilterMethod
  • Excelresources_Files
    • FileLists
    • FileTextParse
    • Folders
  • excelresources_WebScrape
    • CreateList_1-Href_2-Jpg
    • TableData_Load_PlugPlay
    • CreateList_MultiPage_Image_Link
      • ImageDownload_WebLink_List
    • TableData_Load_ClassDiv
    • TableData_Load_ClassSpan
  • Houseboat Vacation
  • At the Foot of the King
  • Devotions
  • Mentoring
  • Inspirational Stories
    • Vacuum Lifting
  • Job Help
  • Resources
    • MS
    • Florida History
  • Contact
  • Heroscape
    • Fish
    • Playstation
  • You Tube Videos
  • House
    • Tiny Houses
    • Backyards
  • CampingResources
    • CampingLand
    • LED Throwies
  • Interesting
  • Recipes
  • Xbox
  • Skydiving
  • ReadingPlan-5Year
  • Desoto
  • BibleStudy
  • HurricaneIrma