·

The Application of AI Spark Big Model in Natural Language Processing (NLP)

发布时间:2024-07-20 00:47:13阅读量:439
学术文章
·
介绍文
转载请注明来源

Introduction

Text analysis is one of the most basic and essential processes in Natural Language Processing (NLP), and it entails extracting valuable insights and information from text data (Cecchini, 2023). With the increasing complexity and volume of text data, it is crucial to focus on the efficiency and scalability of the methods. Cecchini (2023) defines AI Spark NLP as a high-performance Python library based on Apache Spark that provides a complete solution for text data processing. Apache Spark is an open-source framework used to manage and process data in machine-learning tasks and has several benefits that make it suitable for machine learning (Tiwari, 2023). This paper aims to discuss the main features and uses of the AI Spark Big Model that allow for the generation of meaningful data by explicitly focusing on Apache Spark, as it provides a robust and distributed computing framework.

Key Features of the Apache Spark

Apache Spark is an open-source cluster computing framework used for big data workloads. This tool was designed to address the shortcomings of MapReduce by processing in memory, minimizing the number of phases in a task, and reusing data in parallel operations (Tang et al., 2020). According to the Survey Point Team (2023), Apache Spark is more effective compared to MapReduce as it promotes efficient use of resources, enables tasks to be performed concurrently thus resulting to an accelerated data processing.  This tool reuses data using an in-memory cache to significantly accelerate machine learning algorithms that invoke a function on the same data multiple times (Adesokan, 2020). Data reuse is done by creating DataFrames, an abstraction over Resilient Distributed Dataset (RDD), an object collection cached in memory and reused in multiple Spark operations. This greatly minimizes the delay, and therefore makes Spark several times faster than MapReduce, especially when conducting machine learning and interactive analysis.

Apache Spark provides Java, Scala, python, and R language high-level application programming interface and besides the in-memory couching it highly optimizes the execution of queries for fast analytic queries of any size of present data (Gour, 2018). Spark has an optimized engine that performs a general graph of computations also. It also contains a set of high-level tools for working with structured data, machine learning, work with graphs and streams data. From this, Apache Spark model comprises three primary components: Spark Core, Spark SQL, Spark Streaming, Machine Learning, and Graph Processing (Stan et al. , 2019). These components include Spark Streaming, Spark SQL, park Core, MLib, Graph X & Spark R.

Apache Spark has considerable features that makes it outstanding over other big data processing tools. First, the tool is fault-tolerant, and therefore it will have effective outcome when working with the failure of the worker node (Stan et al. , 2019). Apache Spark does this fault tolerance as it works with Directed Acyclic Graphs (DAGs) and Resilient Distributed Datasets (RDDs). Each new action and transformation made on a specific task is stored in the DAG, and in case some of the worker nodes fail, the same transformations can be commented on by the DAG to provide the same results (Rajpurohit et al., 2023). The second characteristic of the AI Apache Spark model is that it is constantly evolving. Salloum et al., 2016 explain that Spark has a dynamic nature with over 80 high-level operators that will assist in developing parallel applications (Salloum et al., 2016). Another unique of Spark is that it is slow in evaluation. On the contrary, transformation is just created and inserted into the DAG, and the final computation or result only occurs whenever actions are called (Salloum et al., 2016). This slow evaluation allows Spark to make an optimization decision for its transformations, and every operation becomes visible to the Spark engine for optimization before any action is taken, which is beneficial for optimizing data processing tasks.

Another important aspect of this tool is the real-time streaming processing that enables users to write streaming jobs the same way as batch jobs (Sahal et al., 2020). This real-time capability, along with the speed of Spark, makes the applications running on Hadoop run up to 100 times faster in memory and up to 10 times faster on disk by avoiding disk read/write operations for intermediate results (Sahal et al., 2020). Moreover, Spark's reusability enables the same code for batch processing, joining the stream against the historical data and running queries on the stream states. Spark also has better analytical tools; it has machine learning and graph processing libraries that organizations in various industries apply to solve complex problems with the help of tools like Databricks (Stan et al., 2019). The in-memory computing of the model also improves the performance by computing tasks in memory and storing the results for iterative computations. Spark has interfaces for Java, Scala, Python, and R for data analysis and SparkSQL for SQL operations (Stan et al., 2019). Spark can be combined with Hadoop, allowing it to read and write data to HDFS and various file formats, making it ideal for various inputs and outputs. Spark is an open-source software that does not have license fees and is cheaper. It has all the features of stream processing, machine learning, and graph processing integrated into the software and does not have vendor lock-in.

Spark NLP is the fastest open-source NLP library. Steller (2024) states that Spark NLP is 38 and 80 times faster than spaCy while having the same accuracy in training custom models. Spark NLP is the only open-source library that can use a distributed Spark cluster. Spark NLP is a native Spark ML library that works on data frames, which are the native data structures of Spark. Therefore, speedups on a cluster lead to yet another order of magnitude of performance improvement (Steller, 2024). In addition to high performance, Spark NLP provides the best accuracy for increasing NLP applications. The Spark NLP team always updates itself with the current literature and churns out the best models.

The Application of Spark Big Model in NLP

1. Sentiment Analysis

One of the tasks that the Apache Spark model conducts during sentiment analysis is data processing and preparation. (Zucco et al. (2020)) assert that sentiment Analysis has become one of the most effective tools that allow companies to leverage social sentiment related to their brand, product, or service. It is natural for humans to identify the emotional tones from the text. However, when it comes to large-scale text data preprocessing, Apache Spark is the best fit for the job due to its efficiency in handling big data (Verma et al., 2020). This capability is critical in AI and machine learning since preprocessing is a significant step. Spark's distributed computing framework enables it to tokenize text data, breaking down the text data into manageable units of words or tokens. The general process of stemming can also be carried out in Spark after tokenization to bring the words to their base or root form, which helps normalize the text data. The other significant preprocessing task is feature extraction, which essentially involves converting text into formats that machine learning algorithms can work on. This is because by distributing the above operations in a cluster by Spark, the preprocessing tasks are done in parallel, improving scalability and performance (Shetty, 2021). This parallelism reduces time and allows you to handle large data sets that would only be possible through conventional single-node processing frameworks. Therefore, applying Spark for text data preprocessing ensures organizations are ready with their data before feeding it to the machine learning and AI model for further training, especially since more and more applications are dealing with large volumes of data.

The second activity that the Apache Spark Model carries out in sentiment analysis is the feature engineering activity. Dey (2024) argues that PySpark is an open-source, large-scale framework to process data developed based on Apache Spark. It provides many functions and classes in data cleaning, summarization, transformation, normalization, feature engineering, and model construction. Besides, Apache Spark’s MLlib stands as a stable environment to perform feature exaction and transformation for its ML algorithms and is important in regards to feature engineering for NLP. The first of these techniques is the TF-IDF, or Term Frequency-Inverse Document Frequency, which transforms textual data into a set of numbers based on the words’ frequency and the exact words’ frequency in a set of documents (Sintia et al. , 2021). This helps to decide the significance of all the words and is rather important for reducing the impact of stop words that is, words that frequently appear very often, but contribute least to meaningful analysis. Further, vocabularies such as Word2Vec generate ordered vectors of the words considering the semantics of the word that is defined by the content of the text. Word2Vec will map similar words in vector space which will enhance the general knowledge of the model as regards the language. Spark’s MLlib assists in the conversion of the raw text into vectors, and this assists in coming up with enhanced and accurate machine learning models particularly in tasks such as, sentiment analysis of textual data.

Apache Spark Model is also applied in training and evaluation for sentiment analysis. Apache Spark is particularly appropriate when training sentiment analysis models due to the availability of many algorithms, including basic ones, such as logistic regression and decision trees, and complex ones, like LSTM networks (Raviya & Vennila, 2021). These models can be trained in parallel across multiple nodes with Spark distributed computing, which erases the timeliness associated with single-machine computation. This parallelization is most useful when the training set is significant because it allows for fully utilizing computational capacity and shortens the training time. In MLlib of Spark, we get the reliable version of each of these algorithms, and much more, the data scientist can switch between these models based on the problem's complexity and the task's requirement (Raviya & Vennila, 2021). Besides, Spark provides cross and other performance characteristics as integrated tools for the model check, thus enabling estimates and improvements to the given models for their high accuracy and good generalisability. It is demonstrated that Spark can be used for training and testing large-scale sentiment analysis models, which is beneficial for organizations since Spark is naturally distributed.

2. Machine Translation

Apache Spark remains very useful in managing large-scale bilingual corpora required to conduct machine translation tasks and train the models. The added advantage of performing complex tasks is that Apache Spark is a distributed computing environment. Spark synchronizes the bilingual sentence pairs in data correspondence, a vital process in corpora alignment, and is also utilized in machine translation models to learn the correct translations (Cutrona, 2021). Notably, all these alignment tasks can be paralleled using Spark and distributed DataFrames and RDDs, significantly accelerating the process. Tokenization is done by segmenting text into words or subwords, and this process is made faster possible due to Spark's ability to partition the data and distribute it across the nodes, especially when working with extensive data. Besides, all cleaning procedures, such as making the text lowercase and handling special characters, are performed using Spark's functions and utilities. Spark distributes these preprocessing operations to ensure that the data is prepared in the best way and in the shortest time possible for subsequent training of machine translation models using frameworks such as TensorFlow or PyTorch integrated with Spark using libraries such as Apache Spark MLlib and TensorFlowOnSpark.

Apache Spark enhances the training of NMT models and other complicated architectures, such as sequence-to-sequence models with attention mechanisms due to distributed computing (Prat et al., 2020). Spark can be interfaced on deep learning frameworks like TensorFlow, Keras and PyTorch which helps in the division of computations by nodes in a cluster. This distribution is made possible by Spark’s RDDs and DataFrames used in hosting and processing of big data. It distributes the input sequences, gradients, and model parameters across the nodes during training, which, unlike using one machine, is faster and can train large datasets, something which isn’t possible on one machine. Nevertheless, Spark can be connected to GPU clusters with the help of such libraries as TensorFlowOnSpark or BigDL which can improve the training process in conjunction with the hardware acceleraton (Lunga et al. , 2020). Thus, organizations can cut the training time and improve the models for that to get higher accuracy of translation. This capability is very essential to build accurate NMT systems which can generate the correct translations, which are of relevance in communication applications and document translation.

3. Text Generation

Apache Spark is used to train many language generation models for text generation tasks like RNNs and the latest transformer model like GPT (Myers et al. , 2024). The first benefit that comes with the use of Spark is that this is a distributed computing system that enhances the rates of training since the computations will be done in parallel across the nodes of the cluster. This distributed approach significantly cuts the time required to train large and complex models and allows for processing large datasets that cannot be processed on a single machine. According to Myers et al. , 2024, Spark's solid ground and effectiveness ensure efficient and effective utilization of resources and the possibility of increasing the training of language models that are contextually appropriate and capable of generating semantically coherent and meaningful text.

Further, Apache Spark is also beneficial when processing enormous data quantities needed for the language model's training due to the distributed computing aspect. This efficiency starts with data loading in Spark, which can read extensive text data in parallel from different sources, hence shortening the time it takes to load data (Myers et al. , 2024). Some other operation that is done before feeding the text data to the models such as tokenization, normalization, and feature extraction operated in parallel with all the nodes to make the text data ready for modeling efficiently. During the training phase, the DataFrame function provided in Spark leads to distributing the computations hence enable management of large data. It enables one to train complex language models for example RNNs and Transformers without having to worry about memory or processing time wastage. Also, Spark’s framework allows distributed model assessment so that the performance metrics and the validation checks can also be calculated on the distributed data at once making it correct. It can increase the scale of the entire text generation process, including data loading- preprocessing- and model testing of textual data making spark fit for large scale NLP tasks.

Conclusion

Apache Spark has proven to be effective tool the management and processing of data compared to other tools. It uses language models which generate text in real time to enable functions such as chat bots, content generation, and auto generation of reports. This is well supported by Spark's in-memory computing, which allows models to read and process data without the delay of disk I/O operations. Spark also optimizes memory to cache intermediate results and other frequently used data so that the text generation tasks can be completed with fast response time, thus giving the users a smooth experience. This high-performance environment is suitable for the real-time needs of interactive applications, which makes it possible to provide timely and relevant text outputs that will be useful to users. With these capabilities, Spark enables the realistic application of state-of-the-art text generation technologies in different use cases. Spark NLP: The Functionality Spark NLP has Python, Java, and Scala libraries that contain all the features of the traditional NLP libraries like spaCy, NLTK, Stanford CoreNLP, and Open NLP. Spark NLP has other features like spell check, sentiment analysis, and document categorization. Spark NLP is more advanced than previous attempts because it offers the best accuracy, speed, and scalability.

References

  1. Adesokan, A. (2020). Performance Analysis of Hadoop MapReduce And Apache Spark for Big Data.
  2. Cecchini, D. (2023). Scaling up text analysis: Best practices with Spark NLP n-gram generation Medium. Available at: https://medium.com/john-snow-labs/scaling-up-text-analysis-best-practices-with-spark-nlp-n-gram-generation-b8292b4c782d
  3. Cutrona, V. (2021). Semantic Table Annotation for Large-Scale Data Enrichment.
  4. Dey, R. (2014). Feature engineering in PySpark: Techniques for data transformation and model improvement. Medium. https://medium.com/@roshmitadey/feature-engineering-in-pyspark-techniques-for-data-transformation-and-model-improvement-30c0cda4969f#:~:text=Introduction%20to%20Feature%20Engineering&text=PySpark%2C%20built%20on%20top%20of,%2C%20transformation%2C%20and%20model%20building.
  5. Gour, R. (2018). Apache Spark Ecosystem — Complete Spark Components Guide. Medium. https://medium.com/@rinu.gour123/apache-spark-ecosystem-complete-spark-components-guide-f3b57893173e
  6. Lunga, D., Gerrand, J., Yang, L., Layton, C., & Stewart, R. (2020). Apache Spark accelerated deep learning inference for large-scale satellite image analytics. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 13, 271–283.
  7. Myers, D., Mohawesh, R., Chellaboina, V. I., Sathvik, A. L., Venkatesh, P., Ho, Y. H., ... & Jararweh, Y. (2024). Foundation and large language models: fundamentals, challenges, opportunities, and social impacts. Cluster Computing, 27(1), 1–26.
  8. Prats, D. B., Marcual, J., Berral, J. L., & Carrera, D. (2020). Sequence-to-sequence models for workload interference. arXiv preprint arXiv:2006.14429.
  9. Rajpurohit, A. M., Kumar, P., Kumar, R. R., & Kumar, R. (2023). A Review on Apache Spark. Kilby, 100, 7th.
  10. Raviya, K., & Vennila, M. (2021). An implementation of hybrid enhanced sentiment analysis system using spark ml pipeline: an extensive data analytics framework. International Journal of Advanced Computer Science and Applications, 12(5).
  11. Shetty, S. D. (2021, March). Sentiment analysis, tweet analysis, and visualization of big data using Apache Spark and Hadoop. In IOP Conference Series: Materials Science and Engineering (Vol. 1099, No. 1, p. 012002). IOP Publishing.
  12. Sintia, S., Defit, S., & Nurcahyo, G. W. (2021). Product Codification Accuracy With Cosine Similarity And Weighted Term Frequency And Inverse Document Frequency (TF-IDF). Journal of Applied Engineering and Technological Science, 2(2), 14–21.
  13. Stan, C. S., Pandelica, A. E., Zamfir, V. A., Stan, R. G., & Negru, C. (2019, May). Apache spark and Apache ignite performance analysis. In 2019, the 22nd International Conference on Control Systems and Computer Science (CSCS) (pp. 726-733). IEEE.
  14. Steller, M. (2024). Large-scale custom natural language processing (NLP). Microsoft. Available at: https://learn.microsoft.com/en-us/azure/architecture/ai-ml/idea/large-scale-custom-natural-language-processing
  15. Survey Point Team (2023). 7 Powerful Benefits of Choosing Apache Spark: Supercharge Your Data, https://surveypoint.ai/knowledge-center/benefits-of-apache-spark/#:~:text=The%20parallel%20processing%20architecture%20of,choice%20for%20handling%20large%20datasets.
  16. Tang, S., He, B., Yu, C., Li, Y., & Li, K. (2020). A survey on spark ecosystem: Big data processing infrastructure, machine learning, and applications. IEEE Transactions on Knowledge and Data Engineering, 34(1), 71-91.
  17. Verma, D., Singh, H., & Gupta, A. K. (2020). A study of big data processing for sentiments analysis.
  18. Zucco, C., Calabrese, B., Agapito, G., Guzzi, P. H., & Cannataro, M. (2020). Sentiment analysis for mining texts and social networks data: Methods and tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(1), e1333.
  19. Tiwari, R. (2023). Simplifying data handling in machine learning with Apache Spark. Medium. Available at: https://medium.com/@NLPEngineers/simplifying-data-handling-for-machine-learning-with-apache-spark-e09076d0256e
  20. Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1, 145-164.
  21. Sahal, R., Breslin, J. G., & Ali, M. I. (2020). Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case. Journal of manufacturing systems, 54, 138-151.

0 人喜欢

评论区

暂无评论,来发布第一条评论吧!

弦圈热门内容

弦圈APP先开发到这里......

这段时间心血来潮,打算开发一下弦圈的手机端APP,这样网站和APP都齐全了。就像数学那种完美无缺的感觉一样,我一直都把弦圈当作做数学那样去做它。之前我在评论区也说过,一直以来都是我自己一个人搞,个人精力有限,因此很多事情进展会比较缓慢。这个APP也是真的难写,手机的环境和网站完全不同,我原先写的代码肯定也是用不上的了,因此也是重新写一套APP的代码。图五为弦圈APP的部分代码,用的是React native写的。我之前写弦圈用的是Vue+Nuxt写的,从零开始自学写的花了六个月时间。现在写这个APP估计也至少需要一个月的时间。写这代码比我当初敲LaTeX写论文还累,这种累主要是身体体力上的累(写代码真的是苦力活),而写论文更多是脑子比较累,因为烧脑,但是我更擅长应对脑力疲劳。毕竟十几岁开始高强度学数学,烧脑对我来说习以为常,真的没什么。除了写代码,我还得运营。这段时间一直专心写代码也没怎么运营网站和社媒了,现在得先放一放APP开发了,不然网站的人气都散了😅。

Serge Lang经典代数教材:Algebra Revised Third Edition

这是GTM211,Serge Lang写的代数经典教材Algebra。关于Serge Lang的教材,虽然在知乎上褒贬不一,但我自己以前在数学圈中,倒是没听过这些负面评价,倒是听说过有人推荐Serge Lang的这本Algebra,遇到代数方面不懂的可以直接查Algebra。我自己基本没读过Serge Lang的教材,这本Algebra除了,记得当初也是看了一些的。这本书作为入门教材,因为篇幅过长,对于不懂得跳着读书的人来说要耗费很多时间,因此不适合。但是翻看目录就可以发现,这本书应该是迄今为止代数方面最完备的教材之一了,前面我分享的GTM242 Grillet抽象代数经典教材:Abstract Algebra 2nd,也是非常完备的代数教材,跟Serge Lang的这本Algebra结合起来正好,因此我之前查抽象代数方面的知识,就是拿GTM242和GTM211这两本作为参考文献,当然还有Stack Project。然后在知乎上我发现(这也是我为啥讨厌知乎)有人会说某教材不好,原因是肤浅或者说书中有不少小错。我觉得还是少拿这种言论来误人子弟,真的知乎什么人都能随便评价数学😅。在我看来, ...

弦圈在各大搜索引擎处于隐身状态,基本搜不到

小众网站弦圈,在各大搜索引擎基本上就是处于隐身状态,可以看看我分享的几张图片。其中图四、图五为谷歌搜索的后台数据,可以看到弦圈在谷歌也基本上是消失了,那个绿色的收录逐渐清零。因此,弦圈搜是搜不到的,之前有粉丝问我找不到网站,怎么搜不到,这就是原因[哭惹R]。如今这个时代,像这种小众小平台,在搜索引擎基本上不会有任何曝光,就相当于不存在。弦圈的网址不会变:manitori.xyz,因此不用担心找不到网站,有任何变化我都会在社媒账号上说明。既然搜索引擎不给我流量,我计划以后也会ban掉所有的搜索引擎,因为现在大家训练AI都在爬别人的数据,网站暴露在搜索引擎其实也相当于给别人免费提供数据。我也不希望用户辛苦写的东西变成烂大街的东西。最后感谢大家的支持以及你们对弦圈的认可,虽然如今弦圈仍有这样那样的问题,但我相信在大家的共同努力下,弦圈会变得更好[派对R]!

频率派和贝叶斯派到底在争论什么?

看了几本教材上对概率的定义,发现都一样啊,难道用相同的定义可以建立不同的概率理论吗?

计划开发弦圈的桌面端版和APP版

由于如今是后移动互联网时代,很多人在碎片化时间基本上都是用手机,手机用户群体庞大。加上弦圈本身也有不少手机端的访客,占比有时略高于电脑端,有时略低于。而弦圈目前只兼容了移动端浏览器,并没有一个真正给手机用的APP。因此,补上弦圈APP能在一定程度上弥补手机端体验的缺失。而桌面端,只是顺带的事情。其实我一直以来都计划开发弦圈桌面端+APP,只不过原计划是在网站运营好了以后再做打算,之前在小红书也有人说想我弄个APP,我当时就说人手不够弄不了。为什么现在突然决定弄呢?其实是因为我感兴趣😂,并且这段时间也闲来无事,网站运营先放一放了,前面高强度宣传引流+更新内容,属实吃不消,而且效果也不太满意。其实当时开发弦圈网站,确实只是为了应付毕设,不过后来我逐渐对网站开发感兴趣,因此就把它做得更好了。网站的名字刚开始也不叫弦圈,而是叫诗词工作室,后来又改成兵水行,总之名字记得改过好几次。网址刚开始是poemstudio.fun,后来还买了chordspace.cn,现在才改成这个。其实刚开始我是打算做APP的(做安卓APP),不过后来忘记是啥原因了,还是觉得做网站比较合适。现在为啥又说对写APP感兴趣 ...

学基础数学可以相信“勤能补拙”吗?

知乎提问:题主是某985数学系大二,学基础方向,做基础的研究是存在很长时间的追求,很有兴趣。但是大学前期贪玩,在学业上花费时间比较少,除了上课和写作业就基本没花时间在数学上了。所以虽然专业课成绩还好看,但考试成绩没法说明学习水平,我自己明显感觉知识储备非常匮乏。在知乎上看同样学基础的同学们,大一大二甚至高中就学了这样那样的课程,而我大二快毕业了甚至还没系统修过抽象代数,前两年只是在按学校安排的课程按部就班地学。现在想专心学习不要荒废学业,突然就比较焦虑。我高中时没有条件搞竞赛训练,自学考过两次高联,高一40分,高二连市上的预赛都没过。考大学生数学竞赛(专业a)也只能拿30分,数分计算经常算不明白,高代学完很少用,现在基本上忘完。拓扑实变这种课,正常难度的课后习题不看答案自己做得至少三十分钟才能搞定一道,一两个小时是常态,更多的还是不会做。花时间少是一方面,但也感觉自己不是有天赋的人。但我偏偏又走到了今天这条路上。现在很迷茫,当初高考选数学专业是因为一脑子热血,想着人这辈子就应该要向着自己的热爱而奋斗。但现在离本科毕业越来越近,我不得不思考未来的出路。我不知道我这样资质平平的人学基础数学 ...

弦圈更新日志之提问新功能:标记疑惑、提出子问题

今天我熬夜肝出了提问的新功能,分别是标记疑惑和提出子问题。示例可见测试问题:center h1 in the middle of screen和Vertically aligning CSS :before and :after content。我就长话短说简单介绍一下。所谓的子问题,意思是你可以根据原有的一个问题,提出一个更加具体、更加细致的子问题。因为有时候一些问题和回答往往比较宽泛,不是很具体,这时子问题就发挥作用了!而标记疑惑则是你可以标记回答中某个不懂的地方。你可以通过选定回答中的某个语句,然后向答主提出你的疑惑点,答主看到后可以回复你的疑惑。目前标记疑惑还有进一步优化的空间,并且之后计划应用在文章和书上,不仅仅是回答。对了,我还顺带美化了一下提问页面的布局和样式,让它看起来更加舒服顺眼一些。晚安😇

写给新诗的入门者的几条建议

 我来为大家推荐几个诗人帮助大家更好地入门和学习新诗。八十年代是个文学热潮,其中诗歌发展尤为迅速,有笑话说:“一板砖下去砸晕十个大学生,九个都是诗人。”相应地,朦胧诗人在大众的世界里流传开来。  相信有许多朋友了解新诗是从海子和顾城开始的,那么在这之后,当然可以深耕这几位诗人,也可以去了解其他的诗人。我个人是非常喜欢穆旦的,而且我认为穆旦是一位非常值得后来者学习的大诗人。相较于以朦胧诗和后朦胧诗出名的一代人,穆旦处在中国新诗草创和形成的时期,语言的张力和新意象的创造都是十分直接和有力的,造成的影响也是多样的,这对于后来者的学习很有益处。而且穆旦在吸收西方现代诗歌营养的同时,也在吸收转化拜伦雪莱等浪漫主义诗人的手法,他的几首诗剧(《神魔之争》《森林治魅——祭胡康河上的白骨》等)我都喜欢,而且写的很好也很动人。晚年的穆旦的作品达到了返璞归真的境界,言语朴实,但是寄意深远。或许晚年的穆旦和旧诗更为亲切,读者入门也更简单。总之,在新诗形象的塑造、意象的表达以及技巧的展示中,穆旦毫无疑问都是上等的,再进一步讲,穆旦诗中隐现的民族、苦难的民众、分裂的祖国、迷惘的自我这一系列形 ...

图灵奖得主写的深度学习入门教材:Deep Learning

这本书是两位图灵奖得主Ian Goodfellow和Yoshua Bengio,以及深度学习专家Aaron Courville写的入门教材。相比于更基础的Aurélien Géron人工智能入门教材:Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow,这本书所需要的基础要多一些,包括数分、线代、概率论,还有一些机器学习基础。由于我并非人工智能领域的专家,这本书也并没有怎么读,所以直接引用这本书的官方介绍。《深度学习》由全球知名的三位专家Ian Goodfellow、Yoshua Bengio 和Aaron Courville撰写,是深度学习领域奠基性的经典教材。全书的内容包括3个部分:第1部分介绍基本的数学工具和机器学习的概念,它们是深度学习的预备知识;第2部分系统深入地讲解现今已成熟的深度学习方法和技术;第3部分讨论某些具有前瞻性的方向和想法,它们被公认为是深度学习未来的研究重点。《深度学习》适合各类读者阅读,包括相关专业的大学生或研究生,以及不具有机器学习或统计背景、但是想要快速补充深度学习知识,以便 ...

喜欢数学但是不擅长数学竞赛怎么办?非数学竞赛生如何在数学专业生存?

知乎提问:初中根本没有学习,中考以极差的分数去了某末流重点高中(中考分数还没jumping高),高中连数学竞赛名额都没有,高三自学过物理竞赛,因为实验和学习时间短没拿省一,高中基本上没有学过数学竞赛,高考620+考上某985数院,发现周围数学竞赛生非常多,说的东西我根本没有听过,大学数学能学会,别人甚至做了竞赛数论几千道题,我甚至一点都不懂。我的回答:我就从未参加过任何所谓的数学竞赛,我自己也对应试和竞赛本身深恶痛绝,反正是一丁点提不起兴趣。我本身也不是那种应试能力强的人,但这也没影响我的数学水平、科研能力。在纯粹的数学面前,所有学生无论擅长竞赛与非,都一视同仁。会就是会,不会就是不会,有些人参加竞赛比你学得好,可能是别人天赋比较就好,或者就是别人学得比你多,付出的努力比你多,那凭啥不能比你懂。所以答案也显而易见了,题主的这种情况,只能勤能补拙。天赋不如别人好,或者自己比别人学得少、学得慢,但是还是好想学数学,怎么办?其实这都是初学者常常遇见的问题。在我看来,最好的办法就是要学会调整自己的心态,明悟心性,把注意力放回到自己身上,放回到数学本身身上,不要老跟别人比较。说实话像什么别人比我 ...

数学入门应该看哪些书?有什么入门书后看了以后能让人爱上数学?

知乎提问:有什么数学方面的入门书能让人看了后会爱上数学的?参加工作很长一段时间了,但对数学的认识和应用都一直处于很初级的水平(初中时的水平吧)。我知道这个数学世界很精彩,也很迷人,也许我会爱上它而不能自拔,但我先得找到一道进入这个世界的门我的回答:如果你对微分几何感兴趣,可以尝试读Loring W Tu微分几何经典入门教材:An Introduction to ManifoldsLoring W Tu微分几何教材:Differential Geometry Connections, Curvature, and Characteristic Classes。如果你对自己的能力比较自信可以先读陈省身的《微分几何讲义》,我以前初三高一的时候就是先看的这本书,后来才看的Loring W Tu。关于Loring W Tu的这两本书,先读第一本,第一本只需要有数分高代的基础就完全够了。有了第一本的基础后,可以读第二本。把这两本读懂后可以读 Jürgen Jost黎曼几何与几何分析教材:Riemannian Geometry and Geometric Analysis。我当初高二的时候就是因为读 ...

center h1 in the middle of screen(示例提问)

How can I center horizontally and vertically a text? I don't want to use position absolute because I try with it and my other div getting worse. Is there another way to do that ?div { height: 400px; width: 800px; background: red; }<div> <h1>This is title</h1> </div>

Aurélien Géron人工智能入门教材:Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

这是本手把手教你入门人工智能的教材,作者是Aurélien Géron。这本书是本畅销书,非常火,被翻译成了多国语言。本书的作者也把自己的这本书免费上传到自己的网站上,不知道现在还是不是,反正我当时下了😇。我一个做算术几何的,当初之所以会关注人工智能,一方面是当时人工智能挺火的,就连算术几何领域也有专家跟计算机科学家们合作,比如Geordie Williamson以及著名的Peter Scholze。Peter Scholze弄了个Liquid Tensor Experiment(液体张量实验),用于所谓的自动定理证明,而这个实验源自于Scholze跟Dustin Clausen在解析几何方面的合作,即将代数几何应用于解析几何。扯远了,回到正题。这本教材可以说图文并茂,配有丰富的插图,并且讲解非常通俗易懂,对初学者非常友好,基本上就需要有点数学分析的基础就可以读了。而且书中还会附上代码过程,给你随时练手,毕竟这个跟编程紧密结合的学科,还是需要编程实践才好学懂的。现在我把这本书分享给对人工智能感兴趣的人。PS:这本书总共八百多页,而且插图很多,因此体积较大。我将其分成三个压缩包分卷上传。

初二,在学交代同调李代数,下一步怎么办?

知乎提问:很抱歉用这个吸引目光的标题骗人进来,但我的确很需要帮助目前我的情况是标题上提到的交代什么的学了,但是和体验卡似的,学完两三周内见题直接秒(有点夸张,但都会基本上差不多),之后就连自己在书上写的过程都看不懂了…再加上课内也没法不管主要问题其实就是时间不够加自控力不足,也就是如何更好自控和如何协调数学与课内的关系(数学扔掉是每个链都有上界的非空偏序集没有极大元,但我课内在我们学校排名才前10%左右,就这个成绩还是考前暂时不学数学备考的结果,更何况我父母要求的中考目标换算到我们学校差不多要排名前3%)总结就是数学和课内分配问题,但我数学扔不下,课内又有强制目标目前想到的解决方法有丘班,但我高数数分什么的记忆性强的全忘了(点明某三角函数的一堆积分,我得现场推),用的书又不深(同济高数+普林斯顿的两本),真要考丘班得消耗半个学期才基本上能保证应该没问题(如果只按0试(好像是一试还是什么)的难度没有较大幅度变动的话)还有生产一篇学术垃圾之后和校方交涉,但这个问题更多求点更靠谱的解决方法,谢谢!(又发了一遍是因为之前发到想法去了...我好蠢)我的回答:说实话所谓的丘班根本不是给普通家庭的学 ...