简单的 IOC 提取器

简单的 IOC 提取器

项目地址:jupyter-collection/iocextractor at main · fr0gger/jupyter-collection (github.com)

1 说明

在安全领域中,个人或机构组织每周都会发出威胁情报报告,数量比较多,这些威胁情报报告中包含许多非常有价值的 IOC 情报,这些 IOC 能在一些 blog 结尾处或者给出的补充文档找到,有些很短,有些很长,但不管怎样,手动进行复制粘贴这些内容显得有点力不从心,好消息的是,在 Github 上有一些 IOC 自动提取器,以下只是做个小笔记展示如何使用 MSTICpy 库中的 IOCextractor 模块从一个链接当中取出 IOCs,包括其它任何源。

2 局限性

由于这是开发的早期阶段,从 URL 中提取的 IOC 可能并不全是恶意的,因为提取器无法区分恶意 URL 和合法 URL。为了克服这个问题,我添加了一个白名单,用于删除任何提取出来的错误数据,但这当然取决于 URL,可能需要过滤掉更多内容。

3 功能改进

  • 改善提取
  • 减少提取出来的错误数据
  • 从多个源 (PDF、文本) 中提取
  • 添加额外的正则表达式
  • 添加多个导出

4 代码

将代码克隆到本地,安装好依赖的库:

图片[1]-简单的 IOC 提取器-零度非安全
安装好工具的依赖库

在 ipython 控制台中运行以下代码:

# Imports and configuration
import os
import glob
import requests
import json
import re
import ipywidgets as widgets
import pandas as pd
from ipywidgets import Button, Layout, Checkbox
from IPython.display import display, HTML
from bs4 import BeautifulSoup
from msticpy.sectools import IoCExtract
# Loading Whitelists
searchdir = "whitelists/whitelist_*.txt"
fpaths = glob.glob(searchdir)
patterns = []
# compiling the whitelist in one list
for fpath in fpaths:
    t = os.path.splitext(fpath)[0].split('_',1)[1]
    patterns += [line.strip() for line in open(fpath)]
# Initiate the IOC extractor
ioc_extractor = IoCExtract()
# Adding btc regex
ioc_extractor.add_ioc_type(ioc_type='btc', ioc_regex='^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$')
# Configure widget
keyword = widgets.Text(
    value = "",
    placeholder = 'Enter the URL',
    description = 'Extract IOCs:',
    layout = Layout(width='90%', height='40px'),
    disabled = False
)
display(keyword)
#Configure checkbox
checkbox_json = widgets.Checkbox(value = False, description="Json")
display(checkbox_json)
checkbox_table = widgets.Checkbox(value = False, description="Table")
display(checkbox_table)
# Configure click button
button = widgets.Button(description = "Extract IOCs", display='flex', layout = Layout(width='20%', height='40px', flex='3 1 0%'), icon = 'check', button_style='primary')
output = widgets.Output()
# Box layout
box_layout = widgets.Layout(display = 'flex', flex_flow='column', align_items='center', width='100%')
box = widgets.HBox(children = [button], layout = box_layout)
display(box)
# Searching for the input url
@output.capture()
def userInput(b):
    try:
        # Request to the url
        headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
        result = requests.get(keyword.value, headers=headers)
        soup = BeautifulSoup(result.text, 'html.parser')
        
        print("[+] Extracting IOC from: " + keyword.value)
        iocs_found = ioc_extractor.extract(str(soup.get_text()))
        if iocs_found:
            #removing element present into the whitelists
            for k, v in iocs_found.items():
                for i in iocs_found[k].copy():
                    for w in patterns:
                        w = re.compile(w)
                        test = re.findall(w, i)
                        if test:
                            try:
                               iocs_found[k].remove(str(i))
                            except:
                               pass
            display(HTML('<h4> \nPotential IoCs found: </h4>'))
            
            # Get JSON Result
            if checkbox_json.value is True:
                ioc = {}
                for k, v in iocs_found.items():
                    value = []
                    for i in iocs_found[k].copy():
                        value.append(i)
                    ioc[k] = value
                jsonioc = json.dumps(ioc, indent=4, sort_keys=True)
                print(jsonioc)
                
            # Get table Result
            if checkbox_table.value is True:
                ioctable = pd.DataFrame([])
                
                for k, v in iocs_found.items():
                    for i in iocs_found[k].copy():
                        ioc = {}
                        ioc[k] = i
                        data = pd.DataFrame(ioc.items())
                        ioctable = ioctable.append(data)
                        
                display(ioctable)
    
        else:
            print("no IOC found!")
        
    except requests.exceptions.RequestException as e:
        print(e)
    except(AttributeError, KeyError) as er:
        print(er)
    
# get the input url
button.on_click(userInput)
display(output)

白名单 host

eset.com$
kaspersky.com$
trendmicro.com$
metasploit.com$
secunia.com$
symantec.com$
cisco.com$
fireeye.com$
mandiant.com$
bluecoat.com$
normanshark.com$
norman.no$
norman.com$
rsa.com$
f-secure.com$
securelist.com$
mcafee.com$
secureworks.com$
zscaler.com$
sophos.com$
avg.com$
isightpartners.com$
eset.sk$
rapid7.com$
crowdstrike.com$
gdata.de$
gdatasoftware.com$
fortinet.com$
fidelissecurity.com$
virustotal.com$
usenix.org$
cve.mitre.org$
clean-mx.de$
malwaredomainlist.com$
contagiodump.blogspot.com$
malware.dontneedcoffee.com$
exploit-db.com$
citizenlab.org$
crysys.hu$
krebsonsecurity.com$
darkreading.com$
shadowserver.org$
google.com$
facebook.com$
youtube.com$
twitter.com$
microsoft.com$
msn.com$
live.com$
windows.com$
adobe.com$
wikipedia.org$
linkedin.com$
yahoo.com$
gmail.com$
googlemail.com$
gmx.com$
gmx.de$
hotmail.com$
outlook.com$
yandex.ru$
github.com$
arstechnica.com$
wired.com$
Snort.org
zdnet.com$
bbc.co.uk$
dailymail.co.uk$
spiegel.de$
reuters.com$
theregister.co.uk$
forbes.com$
heise.de$
nytimes.com$
washingtonpost.com$
cbsnews.com$
archive.zip$
blogblog.com$
www.blogger.com
www.bloomberg.com
www.clamav.net
www.gmer.net
www.google-analytics.com$
www.reddit.com
www.snort.org
www.softperfect.com
www.spamcop.net
www.talosintel.com
www.talosintelligence.com
www.w3.org
blog.clamav.net
blog.emsisoft.com
blog.snort.org
blog.talosintelligence.com
blogger.googleusercontent.com
blogspot.com
fonts.googleapis.com
schema.org
service.post',
signup.umbrella.com
snort.org
static.cloudflareinsights.com
talosintelligence.com
therecord.media
blogs.cisco.com
twitter.com
cisco.com
google.com
linkedin.com
zdnet.Com
youtube.com
microsoft.com
github.com
facebook.com
google-analytics.com
blogblog.com
www.sentinelone.com
www.welivesecurity.com
microsoft.net

白名单 url

^https?:\/\/www.fireeye.com\/
^https?:\/\/blog.fireeye.com\/
^httpv:\/\/www.symantec.com\/
^https?:\/\/blog.kaspersky.com\/
^https?:\/\/blog.trendmicro.com\/
^https?:\/\/blogs.rsa.com\/
^https?:\/\/www.trendmicro.com\/
^https?:\/\/blog.trendmicro.com\/
^https?:\/\/blogs.norman.com\/
^https?:\/\/www.securelist.com\/
^https?:\/\/www.mcafee.com\/
^https?:\/\/blog.crysys.hu\/
^https?:\/\/blogs.cisco.com
^https?:\/\/tools.cisco.com\/security\/
^https?:\/\/www.secureworks.com\/research\/
^https?:\/\/threatexpert.com\/
^https?:\/\/www.f-secure.com\/weblog\/
^https?:\/\/nakedsecurity.sophos.com\/
^https?:\/\/blog.eset.com\/
^https?:\/\/www.gdata.de\/
^https?:\/\/www.sophos.com\/
^https?:\/\/normanshark.com\/
^https?:\/\/www.cve.mitre.org\/
^https?:\/\/www.virusbtn.com\/pdf\/
^https?:\/\/www.blackhat.com\/presentations\/
^https?:\/\/www.usenix.org\/
^https?:\/\/blogs.sans.org\/
^https?:\/\/www.shadowserver.org\/
^https?:\/\/contagiodump.blogspot.com\/
^https?:\/\/support.clean-mx.de\/
^https?:\/\/lists.clean-mx.com\/
^https?:\/\/citizenlab.org\/
^https?:\/\/www.eff.org\/document\/
^https?:\/\/www.exploit-db.com\/exploits\/
^https?:\/\/www.adobe.com\/support\/security\/
^https?:\/\/krebsonsecurity.com\/
^https?:\/\/en.wikipedia.org\/wiki\/
^https?:\/\/www.google.com\/
^https?:\/\/blogger.googleusercontent.com\/
^https?:\/\/apis.google.com\/
^https?:\/\/(?:[\w\-\_]+\.)+(?:google.com|google-analytics|googleapis).com\/
^https?:\/\/(?:[\w\-\_]+\.)+(?:talosintelligence|snort|blogger).com\/
^http:\/\/(?:[\w\-\_]+\.)+(?:talosintelligence).com\/
^http:\/\/(?:[\w\-\_]+\.)+(?:talosintelligence).com
^https?:\/\/talosintelligence.com\/
^https?:\/\/www.talosintelligence.com\/
^https?:\/\/www.talosintelligence.com
^https?:\/\/blog.talosintelligence.com\/
^https?:\/\/blog.talosintelligence.com
^https?:\/\/www.youtube.com\/
^https?:\/\/(?:[\w\-\_]+\.)+(?:snort).org\/
^https:\/\/(?:[\w\-\_]+\.)+(?:w3).org\/
^https?:\/\/www.linkedin.com\/
^https?:\/\/schema.org\/
^https?:\/\/(?:[\w\-\_]+\.)+(?:clamav).net\/
^https?:\/\/www.reddit.com\/
^https?:\/\/www.w3.org\/
^https?:\/\/twitter.com\/
^https?:\/\/www.facebook.com\/
^https?:\/\/snort.org\/
^https?:\/\/cisco.com\/
^https?:\/\/www.cisco.com\/
^https?:\/\/www.blogger.com
^https?:\/\/talosintel.com\/
^https?:\/\/www.talosintel.com
^https?:\/\/static.cloudflareinsights.com\/
^https?:\/\/www.spamcop.net\/
^https?:\/\/(?:[\w\-\_]+\.)+(?:blogblog).com\/
^https?:\/\/www.welivesecurity.com
^https?:\/\/www.sentinelone.com
^https?:\/\/Microsoft.net
© 版权声明
THE END
喜欢就支持一下吧
点赞12赞赏 分享
评论 抢沙发

请登录后发表评论