• [技术干货] 数据抓取、数据采集、数据爬取、数据分析与数据挖掘 
     1. 借助大数据技术搭建分析模型 大数据技术的发展使得数据抓取、数据采集、数据爬取变得更加容易和高效。在数据分析和挖掘过程中,我们需要借助这些技术来获取大量的原始数据。接下来,我们将介绍如何选择合适的大数据分析和机器学习算法,构建分析模型。  1.1 数据收集 数据收集是整个数据分析过程中的第一步。我们需要从不同的数据源中抓取、采集和爬取数据。这可能包括网站、数据库、API等。数据收集的质量和数量直接影响到后续分析和挖掘的效果。  1.2 数据清洗 在数据收集完成后,我们需要对数据进行清洗,以去除重复、错误和无关的数据。数据清洗是数据处理过程中非常重要的一环,它关系到最终分析结果的准确性和可靠性。  1.3 数据整合 经过数据清洗后,我们需要对数据进行整合,将不同来源的数据进行统一和整理。这一步骤的目的是为了方便后续的数据分析工作。  2. 模型搭建 在完成数据收集和整理后,我们开始选择合适的大数据分析和机器学习算法,构建分析模型。常用的算法包括决策树、神经网络、支持向量机等。  2.1 选择算法 选择合适的算法是构建分析模型的关键。我们需要根据实际问题和数据特点来选择最合适的算法。例如,对于分类问题,我们可以选择决策树、神经网络等算法;对于回归问题,我们可以选择支持向量机、线性回归等算法。  2.2 模型训练 选择好算法后,我们需要利用数据对模型进行训练。这一过程中,我们需要不断调整模型的参数,以达到最佳的性能。  2.3 模型评估 模型训练完成后,我们需要对模型进行评估,以检验模型的准确性和泛化能力。常用的评估指标包括准确率、精确率、召回率、F1值等。  3. 模型优化 在模型评估的过程中,我们可能会发现模型存在一些问题,如过拟合、欠拟合等。针对这些问题,我们可以采用一些方法对模型进行优化,如正则化、交叉验证、特征选择等。  数据抓取、数据采集、数据爬取、数据分析与数据挖掘是一个复杂的过程,需要我们借助大数据技术和机器学习算法来完成。通过对数据的收集、清洗、整合,我们可以选择合适的算法构建分析模型,并通过优化模型来解决实际问题。 
  • [技术干货] 精准数据获取三网DPI、sdk实时获取意向用户数据 
    在当今信息化社会,数据已经成为企业竞争的核心资产。如何准确、快速地获取有效数据,是各行各业都在关注的问题。本文将为您介绍一种能够精准获取三网DPI、SDK实时获取意向用户数据的方法,帮助您更好地理解并应用这一技术。 该方法基于深度数据包检测(DPI)技术,通过实时捕获、解析网络流量,提取关键信息。收集与自身行业相关的网页链接、手机上App、小程序名称、关键字和400号码,然后根据运营商大数据的数据信息数据模型创建精确数据模型。接着,对顾客的上网行为、通讯行为进行分析,从而获得顾客的自身联系电话等信息,如地域、性别、访问频次、访问时间等各行各业的数据信息。 例如,金融贷款、教育机构、股民、期货、外汇、工商企业、POS机、机票、保健品、白酒、各类电商等行业,都可以通过这一方法实时获取高精准的客户数据。这种数据获取方式可以渗透SDK/DPI爬虫精准数据,获取指定网站访客、app登录访客手机号码。但是,这种数据不像渗透资源那样类别齐全,基本只包含手机号码、对应的APP。 精准数据获取三网DPI、SDK实时获取意向用户数据的方法,具有广泛的应用前景。它可以帮助企业快速获取潜在客户信息,提高市场营销效果。通过对客户的上网行为、通讯行为进行分析,企业可以更好地了解客户需求,优化产品和服务。该方法还可以应用于金融、教育、医疗等行业,提高行业监管水平,保障公共利益。精准数据获取技术将为我国各行各业的发展带来深远影响。 三网运营商大数据的精准客户资源,其原理和机制主要基于数据挖掘和用户行为分析。运营商通过各种手段收集用户的数据,包括用户的实时访问行为、应用使用行为、通信行为等。然后,通过数据挖掘技术,对这些数据进行深度分析,提取出用户的各种特征和行为模式。根据企业的需求,通过标签筛选,精准地识别出目标用户。 以移动运营商为例,其大数据平台可以实时跟踪所有移动用户的各种行为足迹,进行意向数据确认,如搜索行为、访问行为、应用下载、注册、登录行为、短信交互行为、拨号行为、消费记录等综合信息。通过对这些数据的分析,企业可以精准地找到目标客户,实现精准营销。例如,一家电商企业,通过移动运营商的大数据平台,找到了最近一周内在淘宝、京东等电商平台频繁搜索和浏览手机的用户,然后针对这些用户进行精准的广告投放,结果显著提高了产品的销售量。 
  • [问题求助] FusionInsight_HD_8.2.0.1产品,在Flink SQL客户端中select 'hello'报错KeeperErrorCode = ConnectionLoss for /flink_base/flink
    flinkSQL client中select 还是报错的,请帮忙指点下,哪里有问题?谢谢org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$SessionClosedRequireAuthException: KeeperErrorCode = Session closed because client failed to authenticate for /flink_base/flink或者org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink_base/flinkzookeeper已经启动,192.168.0.82:24002 ,而且zookeeper中的ACL权限已经设置,但是在设置配额失败[zk: 192.168.0.82:24002(CONNECTED) 2] create /flink_base/flink_base Created /flink_base/flink_base [zk: 192.168.0.82:24002(CONNECTED) 3] ls /flink_base/ Path must not end with / character [zk: 192.168.0.82:24002(CONNECTED) 4] ls /flink_base [flink, flink_base] [zk: 192.168.0.82:24002(CONNECTED) 5] [zk: 192.168.0.82:24002(CONNECTED) 5] [zk: 192.168.0.82:24002(CONNECTED) 5] [zk: 192.168.0.82:24002(CONNECTED) 5] setquota -n 1000000 /flink_base/flink Insufficient permission : /flink_base/flink [zk: 192.168.0.82:24002(CONNECTED) 6] getAcl /flink_base/flink 'world,'anyone : cdrwa [zk: 192.168.0.82:24002(CONNECTED) 7] setAcl /flink_base/flink world:anyone:rwcda [zk: 192.168.0.82:24002(CONNECTED) 8] setquota -n 1000000 /flink_base/flink Insufficient permission : /flink_base/flink [zk: 192.168.0.82:24002(CONNECTED) 9] getAcl /flink_base/ Path must not end with / character [zk: 192.168.0.82:24002(CONNECTED) 10] getAcl /flink_base 'world,'anyone : cdrwa [zk: 192.168.0.82:24002(CONNECTED) 11] getAcl /flink_base/flink 'world,'anyone : cdrwa [zk: 192.168.0.82:24002(CONNECTED) 12] ls /zookeeper/quota [beeline, elasticsearch, flink_base, graphbase, hadoop, hadoop-adapter-data, hadoop-flag, hadoop-ha, hbase, hdfs-acl-log, hive, hiveserver2, kafka, loader, mr-ha, rmstore, sparkthriftserver, sparkthriftserver2x, sparkthriftserver2x_sparkInternal_HAMode, yarn-leader-election] [zk: 192.168.0.82:24002(CONNECTED) 13] ls /zookeeper/quota/flink_base [zookeeper_limits, zookeeper_stats] [zk: 192.168.0.82:24002(CONNECTED) 5] setquota -n 1000000 /flink_base/flink Insufficient permission : /flink_base/flink tail -f /home/dmp/app/ficlient/Flink/flink/log/flink-root-sql-client-192-168-0-85.log  中的日志如下flink-conf.yaml中的全部配置如下akka.ask.timeout: 120 s akka.client-socket-worker-pool.pool-size-factor: 1.0 akka.client-socket-worker-pool.pool-size-max: 2 akka.client-socket-worker-pool.pool-size-min: 1 akka.framesize: 10485760b akka.log.lifecycle.events: false akka.lookup.timeout: 30 s akka.server-socket-worker-pool.pool-size-factor: 1.0 akka.server-socket-worker-pool.pool-size-max: 2 akka.server-socket-worker-pool.pool-size-min: 1 akka.ssl.enabled: true akka.startup-timeout: 10 s akka.tcp.timeout: 60 s akka.throughput: 15 blob.fetch.backlog: 1000 blob.fetch.num-concurrent: 50 blob.fetch.retries: 50 blob.server.port: 32456-32520 blob.service.ssl.enabled: true classloader.check-leaked-classloader: false classloader.resolve-order: child-first client.rpc.port: 32651-32720 client.timeout: 120 s compiler.delimited-informat.max-line-samples: 10 compiler.delimited-informat.max-sample-len: 2097152 compiler.delimited-informat.min-line-samples: 2 env.hadoop.conf.dir: /home/dmp/app/ficlient/Flink/flink/conf env.java.opts.client: -Djava.io.tmpdir=/home/dmp/app/ficlient/Flink/tmp env.java.opts.jobmanager: -Djava.security.krb5.conf=/opt/huawei/Bigdata/common/runtime/krb5.conf -Djava.io.tmpdir=${PWD}/tmp -Des.security.indication=true env.java.opts.taskmanager: -Djava.security.krb5.conf=/opt/huawei/Bigdata/common/runtime/krb5.conf -Djava.io.tmpdir=${PWD}/tmp -Des.security.indication=true env.java.opts: -Xloggc:<LOG_DIR>/gc.log -XX:+PrintGCDetails -XX:-OmitStackTraceInFastThrow -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=20M -Djdk.tls.ephemeralDHKeySize=3072 -Djava.library.path=${HADOOP_COMMON_HOME}/lib/native -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv6Addresses=false -Dbeetle.application.home.path=/opt/huawei/Bigdata/common/runtime/security/config -Dwcc.configuration.path=/opt/huawei/Bigdata/common/runtime/security/config -Dscc.configuration.path=/opt/huawei/Bigdata/common/runtime/securityforscc/config -Dscc.bigdata.common=/opt/huawei/Bigdata/common/runtime env.yarn.conf.dir: /home/dmp/app/ficlient/Flink/flink/conf flink.security.enable: true flinkserver.alarm.cert.skip: true flinkserver.host.ip: fs.output.always-create-directory: false fs.overwrite-files: false heartbeat.interval: 10000 heartbeat.timeout: 120000 high-availability.job.delay: 10 s high-availability.storageDir: hdfs://hacluster/flink/recovery high-availability.zookeeper.client.acl: creator high-availability.zookeeper.client.connection-timeout: 90000 high-availability.zookeeper.client.max-retry-attempts: 5 high-availability.zookeeper.client.retry-wait: 5000 high-availability.zookeeper.client.session-timeout: 90000 high-availability.zookeeper.client.tolerate-suspended-connections: true high-availability.zookeeper.path.root: /flink high-availability.zookeeper.path.under.quota: /flink_base high-availability.zookeeper.quorum: 192.168.0.82:24002,192.168.0.81:24002,192.168.0.80:24002 high-availability.zookeeper.quota.enabled: true high-availability: zookeeper job.alarm.enable: true jobmanager.heap.size: 1024mb jobmanager.web.403-redirect-url: https://192.168.0.82:28443/web/pages/error/403.html jobmanager.web.404-redirect-url: https://192.168.0.82:28443/web/pages/error/404.html jobmanager.web.415-redirect-url: https://192.168.0.82:28443/web/pages/error/415.html jobmanager.web.500-redirect-url: https://192.168.0.82:28443/web/pages/error/500.html jobmanager.web.access-control-allow-origin: * jobmanager.web.accesslog.enable: true jobmanager.web.allow-access-address: * jobmanager.web.backpressure.cleanup-interval: 600000 jobmanager.web.backpressure.delay-between-samples: 50 jobmanager.web.backpressure.num-samples: 100 jobmanager.web.backpressure.refresh-interval: 60000 jobmanager.web.cache-directive: no-store jobmanager.web.checkpoints.disable: false jobmanager.web.checkpoints.history: 10 jobmanager.web.expires-time: 0 jobmanager.web.history: 5 jobmanager.web.logout-timer: 600000 jobmanager.web.pragma-value: no-cache jobmanager.web.refresh-interval: 3000 jobmanager.web.ssl.enabled: false jobmanager.web.x-frame-options: DENY library-cache-manager.cleanup.interval: 3600 metrics.internal.query-service.port: 28844-28943 metrics.reporter.alarm.factory.class: com.huawei.mrs.flink.alarm.FlinkAlarmReporterFactory metrics.reporter.alarm.interval: 30 s metrics.reporter.alarm.job.alarm.checkpoint.consecutive.failures.num: 5 metrics.reporter.alarm.job.alarm.failure.restart.rate: 80 metrics.reporter.alarm.job.alarm.task.backpressure.duration: 180 s metrics.reporter: alarm nettyconnector.message.delimiter: $_ nettyconnector.registerserver.topic.storage: /flink/nettyconnector nettyconnector.sinkserver.port.range: 28444-28843 nettyconnector.ssl.enabled: false parallelism.default: 1 query.client.network-threads: 0 query.proxy.network-threads: 0 query.proxy.ports: 32541-32560 query.proxy.query-threads: 0 query.server.network-threads: 0 query.server.ports: 32521-32540 query.server.query-threads: 0 resourcemanager.taskmanager-timeout: 300000 rest.await-leader-timeout: 30000 rest.bind-port: 32261-32325 rest.client.max-content-length: 104857600 rest.connection-timeout: 15000 rest.idleness-timeout: 300000 rest.retry.delay: 3000 rest.retry.max-attempts: 20 rest.server.max-content-length: 104857600 rest.server.numThreads: 4 restart-strategy.failure-rate.delay: 10 s restart-strategy.failure-rate.failure-rate-interval: 60 s restart-strategy.failure-rate.max-failures-per-interval: 1 restart-strategy.fixed-delay.attempts: 3 restart-strategy.fixed-delay.delay: 10 s restart-strategy: none security.cookie: 9477298cd52a3e409ed0bc570bdc795179fcc7c301a1225e22f47fe0a3db47c2 security.enable: true security.kerberos.login.contexts: Client,KafkaClient security.kerberos.login.keytab: security.kerberos.login.principal: security.kerberos.login.use-ticket-cache: true security.networkwide.listen.restrict: true security.ssl.algorithms: TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 security.ssl.enabled: false security.ssl.encrypt.enabled: false security.ssl.key-password: Bapuser@9000 security.ssl.keystore-password: Bapuser@9000 security.ssl.keystore: ssl/flink.keystore security.ssl.protocol: TLSv1.2 security.ssl.rest.enabled: false security.ssl.truststore-password: Bapuser@9000 security.ssl.truststore: ssl/flink.truststore security.ssl.verify-hostname: false slot.idle.timeout: 50000 slot.request.timeout: 300000 state.backend.fs.checkpointdir: hdfs://hacluster/flink/checkpoints state.backend.fs.memory-threshold: 20kb state.backend.incremental: true state.backend: rocksdb state.savepoints.dir: hdfs://hacluster/flink/savepoint task.cancellation.interval: 30000 task.cancellation.timeout: 180000 taskmanager.data.port: 32391-32455 taskmanager.data.ssl.enabled: false taskmanager.debug.memory.logIntervalMs: 0 taskmanager.debug.memory.startLogThread: false taskmanager.heap.size: 1024mb taskmanager.initial-registration-pause: 500 ms taskmanager.max-registration-pause: 30 s taskmanager.maxRegistrationDuration: 5 min taskmanager.memory.fraction: 0.7 taskmanager.memory.off-heap: false taskmanager.memory.preallocate: false taskmanager.memory.segment-size: 32768 taskmanager.network.detailed-metrics: false taskmanager.network.memory.buffers-per-channel: 2 taskmanager.network.memory.floating-buffers-per-gate: 8 taskmanager.network.memory.fraction: 0.1 taskmanager.network.memory.max: 1gb taskmanager.network.memory.min: 64mb taskmanager.network.netty.client.connectTimeoutSec: 300 taskmanager.network.netty.client.numThreads: -1 taskmanager.network.netty.num-arenas: -1 taskmanager.network.netty.sendReceiveBufferSize: 4096 taskmanager.network.netty.server.backlog: 0 taskmanager.network.netty.server.numThreads: -1 taskmanager.network.netty.transport: nio taskmanager.network.numberOfBuffers: 2048 taskmanager.network.request-backoff.initial: 100 taskmanager.network.request-backoff.max: 10000 taskmanager.numberOfTaskSlots: 1 taskmanager.refused-registration-pause: 10 s taskmanager.registration.timeout: 5 min taskmanager.rpc.port: 32326-32390 taskmanager.runtime.hashjoin-bloom-filters: false taskmanager.runtime.max-fan: 128 taskmanager.runtime.sort-spilling-threshold: 0.8 use.path.filesystem: true use.smarterleaderlatch: true web.submit.enable: false web.timeout: 10000 yarn.application-attempt-failures-validity-interval: 600000 yarn.application-attempts: 5 yarn.application-master.port: 32586-32650 yarn.heap-cutoff-min: 384 yarn.heap-cutoff-ratio: 0.25 yarn.heartbeat-delay: 5 yarn.heartbeat.container-request-interval: 500 yarn.maximum-failed-containers: 5 yarn.per-job-cluster.include-user-jar: ORDER zk.ssl.enabled: false zookeeper.clientPort.quorum: 192.168.0.82:24002,192.168.0.81:24002,192.168.0.80:24002 zookeeper.root.acl: OPEN zookeeper.sasl.disable: false zookeeper.sasl.login-context-name: Client zookeeper.sasl.service-name: zookeeper zookeeper.secureClientPort.quorum: 192.168.0.82:24002,192.168.0.81:24002,192.168.0.80:24002 
  • [问题求助] FusionInsight_HD_8.2.0.1产品,在Flink SQL客户端中select 'hello'报错KeeperErrorCode = ConnectionLoss for /flink_base/flink
    1.在flink sql client中执行sql  直接报错[ERROR] Could not execute SQL statement. Reason: org.apache.flink.shaded.zookeeper3.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /flink_base/flink 2.而且进入zookeeper中查询也是报错,求解求解[omm@192-168-0-82 zookeeper]$ pwd /opt/huawei/Bigdata/FusionInsight_HD_8.2.0.1/install/FusionInsight-Zookeeper-3.6.3/zookeeper [omm@192-168-0-82 zookeeper]$ bin/zkCli.sh -server 192.168.0.82:24002 Connecting to 192.168.0.82:24002 Welcome to ZooKeeper! JLine support is enabled  WATCHER::  WatchedEvent state:SyncConnected type:None path:null [zk: 192.168.0.82:24002(CONNECTING) 0] ls / KeeperErrorCode = Session closed because client failed to authenticate for / [zk: 192.168.0.82:24002(CONNECTED) 1] WATCHER::  WatchedEvent state:Disconnected type:None path:null  WATCHER::  WatchedEvent state:SyncConnected type:None path:null  WATCHER::  WatchedEvent state:Disconnected type:None path:null 后面是一直循环WATCHER:,flink-conf.yaml中的部分设置如下 flink.security.enable: true flinkserver.alarm.cert.skip: true flinkserver.host.ip: fs.output.always-create-directory: false fs.overwrite-files: false heartbeat.interval: 10000 heartbeat.timeout: 120000 high-availability.job.delay: 10 s high-availability.storageDir: hdfs://hacluster/flink/recovery high-availability.zookeeper.client.acl: creator high-availability.zookeeper.client.connection-timeout: 90000 high-availability.zookeeper.client.max-retry-attempts: 5 high-availability.zookeeper.client.retry-wait: 5000 high-availability.zookeeper.client.session-timeout: 90000 high-availability.zookeeper.client.tolerate-suspended-connections: true high-availability.zookeeper.path.root: /flink high-availability.zookeeper.path.under.quota: /flink_base high-availability.zookeeper.quorum: 192.168.0.82:24002,192.168.0.81:24002,192.168.0.80:24002 high-availability.zookeeper.quota.enabled: true high-availability: zookeeper yarn.application-attempts: 5 yarn.application-master.port: 32586-32650 yarn.heap-cutoff-min: 384 yarn.heap-cutoff-ratio: 0.25 yarn.heartbeat-delay: 5 yarn.heartbeat.container-request-interval: 500 yarn.maximum-failed-containers: 5 yarn.per-job-cluster.include-user-jar: ORDER zk.ssl.enabled: false zookeeper.clientPort.quorum: 192.168.0.82:24002,192.168.0.81:24002,192.168.0.80:24002 zookeeper.root.acl: OPEN zookeeper.sasl.disable: false zookeeper.sasl.login-context-name: Client zookeeper.sasl.service-name: zookeeper zookeeper.secureClientPort.quorum: 192.168.0.82:24002,192.168.0.81:24002,192.168.0.80:24002 
  • [API集成编排] 为啥我的GDE里面啥都没有
    这是啥原因呀,在线求助     
  • [问题求助] bigdata
    想学大数据,大家有什么推荐没
  • [其他] 【升级】升级集群失败(42%):报SCP /opt/huawei/Bigdata/mppdb/gtm/gtm.control
    【问题描述】GaussDB A C80SPC300-》 GaussDB A 8.0.0.1升级集群失败(42%):报SCP  /opt/huawei/Bigdata/mppdb/gtm/gtm.control_b: Permission denied【问题根因】由于一线在备份 /opt/huawei/Bigdata/mppdb/gtm/gtm.control文件使用root用户备份,导致 /opt/huawei/Bigdata/mppdb/gtm/目录下含有非omm用户的文件【解决办法】修改文件gtm.control_b 属主数组为omm:wheel,问题解决